Développez l'apprentissage automatique prédictif avec Flink | Atelier du 18 déc. | S'inscrire

Securely Query Confluent Cloud from Amazon Redshift with mTLS

Écrit par

Querying databases comes with costs—wall clock time, CPU usage, memory consumption, and potentially actual dollars. As your application scales, optimizing these costs becomes crucial. Materialized views offer a powerful solution by creating a pre-computed, optimized data representation. Imagine a retail scenario with separate customer and product tables. Typically, retrieving product details for a customer's purchase requires cross-referencing both tables. A materialized view simplifies this by combining customer names and associated product details into a single table, enhanced with indexing for faster read performance. This approach minimizes the database's workload, reducing query processing time and associated costs.

In the context of data streaming with Apache Kafka®, materialized views become even more valuable. They act as a read-optimized cache of your Kafka data, allowing queries to target the materialized view instead of the raw Kafka data. This significantly boosts performance, especially for complex or frequent queries. The view automatically refreshes as new events stream into the Kafka topic, ensuring data freshness. Thanks to the recent release of mutual TLS on Confluent Cloud and Amazon Redshift, this pattern is now possible between the two services.

Stream Confluent topics to an Amazon Redshift materialized view

Confluent Cloud and Amazon Redshift recently released mutual TLS (mTLS) authentication support for their respective platforms.

For Confluent Cloud, support for mTLS authentication is driven by customers who are migrating workloads from on-premises or other self-managed Kafka solutions to Confluent Cloud, and have existing infrastructure built around mTLS authentication. By bringing their own certificate authority (CA) to Confluent Cloud, customers can easily configure Kafka client authentication to Confluent clusters using customer-owned certificates. Even customers using other AuthN types can benefit by simply adding mTLS to their existing dedicated cluster. Like with API keys and OAuth/OIDC authentication, Confluent Cloud supports configuring role-based access control (RBAC) or access control lists (ACLs) on different client certificates for granular access control.

Amazon Redshift can seamlessly connect to Confluent Cloud using mTLS authentication with AWS Private Certificate Authority or with a self-managed certificate authority stored in AWS Secrets Manager. This blog walks through the use cases for Confluent Cloud and Amazon Redshift, and provides step-by-step instructions for configuration on both sides. 

Setup

Below is the architecture diagram for this setup. Both the Amazon Redshift cluster and the Confluent cluster will be deployed in the same region to save on inter-region data transfer costs. Public networking will be used; however, the setup is similar with other networking options such as VPC peering, AWS Transit Gateway, or AWS PrivateLink. This architecture assumes you can create your own custom CA. If you need to use an existing CA, you can use the AWS Secrets Manager instead of AWS Private Certificate Authority.

Amazon Redshift setup

  1. Navigate to Amazon Redshift and click “Create cluster.”

  2. Select any node type.

  3. Select how you would like to set your admin password in the “Database configurations” section.

  4. In the “Cluster permissions” section, click “Associate IAM role” and attach the role you created in the previous step.

  5. All other fields can be left as default.

  6. Create the cluster.

Note: Redshift Serverless workgroups are also supported.

Confluent Cloud dedicated cluster setup

The following steps assume that you’re already signed up with Confluent Cloud and have OrganizationAdmin permissions with your user account.

  1. Navigate to Confluent Cloud and create a dedicated cluster. You can leave it sized to 1 Confluent Unit for Kafka (CKU). 

  2. Navigate to the “Topics” tab and click “Create topic.”

  3. Provide the topic name orders and leave the rest as defaults. Skip the data contract pop-up that comes afterwards.

  4. Navigate to the “Connectors” tab and click “Add connector.”

  5. Select “Sample data.” 

  6. Select “Additional configuration.” 

  7. Select the orders topics you just created.

  8. In the “Configurations” section, select the “output record value format” as “JSON.”  Note: Streaming ingestion with Amazon Redshift does not support Schema Registry at this time. Selecting AVRO, JSON_SR, or PROTOBUF will cause records to be sent to Amazon Redshift unserialized.

  9. Select the “Orders” for the schema. Schema references in this case defines what the generated messages will look like as opposed to the serialization used.

  10. Leave the rest as defaults and click “create the connector.”

  11. You can navigate back to the orders topic and see data flowing in the topic.

  12. Last, navigate to the “Cluster Settings” tab of your cluster and find the bootstrap server. It will have a similar format displayed as the following: pkc-xxxxx.us-east-2.aws.confluent.cloud:9092. Keep this value handy to use in later steps.

Set up private certificate authority

  1. Be sure you are in the same region as your Amazon Redshift cluster and Confluent Cloud cluster (us-east-2 if you’ve been following this guide).

  2. Navigate to AWS Private Certificate Authority and click “Create a Private CA.”

  3. Leave the mode option as “General-purpose” and the CA type option as “Root.”

  4. Fill out the “Subject distinguished name options” accordingly.

  5. Check the checkbox for Pricing and click “Create CA.”

  6. Once the CA is created, be sure to install the CA certificate.

  7. Once you see the Status field as active for the CA, click into your newly created certificate authority.

  8. Find and click the “CA certificate” tab.

  9. Within that tab, you will see an “Additional information” section containing the certificate body. Click the “Export certificate body to a file.” This will download a .pem file to use later.

Configure mTLS authentication on Confluent Cloud

  1. Navigate to the Workload identities page in Confluent Cloud.

  2. Click “Add provider.”

  3. Select “Certificate authority.”

  4. Select “Add PEM file” and upload the .pem file you downloaded earlier from the Certificate Authority you created. 

  5. Finish the setup and create the identity provider.

  6. Within the identity provider, click “Add pool.”

  7. Provide a name and leave the Certificate identifier to “CN.”

  8. Set up the filter for matching client certificates for your requirements. Follow Confluent Cloud CEL filter documentation for accepted filter expressions. For testing purposes, you can set it to true.

  9. Attach the CloudClusterAdmin role for the dedicated cluster to the identity pool, or a more granular RBAC role if you wish to limit the access of your Redshift client. Make sure you click “Add” and see the role appear on the right panel before moving to the next step.

  10. Click “Validate and save.”

Setting up AWS client certificate 

  1. Navigate to the AWS Certificate Manager

  2. Click “Request.”

  3. Select “Request a private certificate.”

  4. In the “Certificate authority” dropdown, you’ll see the CA you created in the previous section.

  5. For the “Fully qualified domain name,” you can provide any name for the purposes of this exercise.

  6. Leave the rest as defaults and click “Request.”

  7. Once the certificate is issued, copy the certificate ARN and set it aside for future use.

Create IAM policy and role

The following IAM policy and role allows Amazon Redshift to retrieve the certificate from AWS Certificate Manager.

  1. Navigate to IAM and create a new policy.

  2. Use the following JSON to define the policy. This policy gives Redshift acm:ExportCertificate permissions so it can use the previously created certificate.

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "acm:ExportCertificate"
                ],
                "Resource": [
                    "arn:aws:acm:<region>:<accountid>:certificate/certificate_ID"
                ]
            }
        ]
    }

  3. Provide the policy a name like ExportCertificatePolicy and create the policy.

  4. Navigate to the IAM Role tab and create a new role.

  5. For the trusted entity, select “custom trust policy.”

  6. Edit the values in the trust policy below, paste it into the “Custom trust policy” box, and click next. This trust policy allows Redshift to assume this role on your behalf.

    {
    	"Version": "2012-10-17",
    	"Statement": [
    		{
    			"Effect": "Allow",
    			"Principal": {
    				"Service": "redshift.amazonaws.com"
    			},
    			"Action": "sts:AssumeRole"
    		}
    	]
    }

  7. Add the policy you just created and click next.

  8. Provide the role a name.

  9. Click Create Role.

Set up Amazon Redshift to consume from Confluent Cloud

  1. Navigate to Amazon Redshift and open the query editor for your cluster.

  2. Run the following command to create an external schema in Redshift that tells Redshift the Confluent Cloud cluster to connect to, authentication method to use, and which certificate to use for mTLS:

    CREATE EXTERNAL SCHEMA redshift_cwc_testing
    FROM KAFKA
    IAM_ROLE '<arn_of_iam_role_you_created>'
    AUTHENTICATION mtls
    URI '<your_bootstrap_servers>'
    AUTHENTICATION_ARN '<arn_of_your_certificate>';

  3. Create the materialized view. This materialized view will be used to link Redshift with a topic in the Confluent Cloud cluster, and this is also where the data will be stored during ingestion.

    CREATE MATERIALIZED VIEW kafka_orders AUTO REFRESH YES AS
    SELECT *
    FROM redshift_cwc_testing."orders";

  4. With your materialized view created, you can now query the data. Note: It may take a few seconds before the data starts to get ingested into the Redshift cluster.

    select * from kafka_orders;

Next steps

With this setup, integrating Confluent Cloud with Amazon Redshift materialized views offers a powerful solution for real-time data ingestion and analysis. Materialized views act as a pre-computed, read-optimized cache of your Kafka data, enabling significantly faster query performance compared to querying raw Kafka data. This is particularly beneficial for complex or frequent queries. The view automatically refreshes as new data arrives in your Kafka topic, ensuring data freshness.

Ready to get started with Confluent Cloud on AWS Marketplace? New sign-ups receive $1,000 in free credits for their first 30 days! Subscribe through AWS Marketplace and your credits will be instantly applied to your Confluent account.

You can explore more in the documentation included below:

Final note

If you are not using a dedicated cluster, need Schema Registry support, or need to load data in the Redshift table (as opposed to just a materialized view), consider using Confluent Cloud’s fully managed connector for Amazon Redshift.

Amazon and all related marks are trademarks of Amazon.com, Inc. or its affiliates.

Apache and Apache Kafka® are trademarks of the Apache Software Foundation.

  • Braeden Quirante began his career as a software consultant where he worked on a wide array of technical solutions including web development, cloud architecture, microservices, automation, and data warehousing. Following these experiences, he joined Amazon Web Services as a partner solutions architect working with AWS partners in scaled motions such as go-to-market activities and partner differentiation programs. Braeden currently serves as a partner solutions engineer for Confluent and an AWS evangelist.

  • Jiaqi Gao is a Senior Product Manager at Confluent, leading the development of advanced security features for its data streaming platform and managed Apache Kafka and Flink services.

    She has extensive experience delivering critical features in multi-factor authentication, single sign-on, and Zero Trust, along with driving cross-functional initiatives and managing large-scale customer transitions. Jiaqi also contributed to machine learning research, focusing on voice cloning prevention.

  • Adekunle Adedotun is a Senior Database Engineer with the Amazon Redshift service. He has been working on MPP databases for six years with a focus on performance tuning. He also provides guidance to the development team for new and existing service features.

Avez-vous aimé cet article de blog ? Partagez-le !