Développez l'apprentissage automatique prédictif avec Flink | Atelier du 18 déc. | S'inscrire

Introducing Confluent Platform 7.4

Écrit par
  • Bharath VenkatSenior Product Marketing Manager – Technical Lead for Digital Growth
  • Hasan JilaniStaff Product Marketing Manager, Confluent

Introducing Confluent Platform 7.4 - enabling Apache Kafka® to scale to millions of partitions, simplifying architecture, accelerating time to market with self-service tooling and codified best practices for developers, and ensuring consistent and accurate data.

Building on the innovative feature set delivered in previous releases, Confluent Platform 7.4 makes enhancements to three categories of features:

  1. Enhance scalability and simplify your architecture with production-ready KRaft support for new clusters.

  2. Achieve faster time to value by providing a self-service control plane for developers with Confluent for Kubernetes Blueprints.

  3. Ensure trusted, high-quality data streams by leveraging Data Quality Rules for Schema Registry.

Note: Data Quality Rules is in preview state, meaning the feature is not yet supported for production environments and is only advised for testing purposes.

This release blog post explores each of these enhancements in detail, taking a deeper dive into the major feature updates and benefits. As with previous Confluent Platform releases, you can always find more details about the features in the release notes.

Keep reading to get an overview of what’s included in Confluent Platform 7.4, or download Confluent Platform now if you’re ready to get started

Confluent Platform 7.4 at a glance

Confluent Platform 7.4 comes with several enhancements to existing features, enhancing scalability while bringing a more simplified architecture, expanding DevOps automation, and improving data quality. Here are some highlights:

Production-ready KRaft support for new clusters

As part of this release, KRaft is now production-ready and generally available in Confluent Platform 7.4 for new deployments. Specifically, support has been added for Confluent for Kubernetes, Ansible, and Docker greenfield deployments. Following ZooKeeper’s removal in Apache Kafka 3.0, Confluent Platform 7.0 introduced Apache Kafka Raft Metadata mode (KRaft) in preview to make it easier to monitor and scale Kafka clusters to millions of partitions.

Part of KIP-500, KRaft was introduced to remove Kafka’s dependency on ZooKeeper for metadata management. Replacing external metadata management with KRaft greatly simplifies Kafka’s architecture by consolidating responsibility for metadata into Kafka itself, rather than splitting it between two different systems: ZooKeeper and Kafka. 

This improves stability, simplifies the software, and makes it easier to monitor, administer, and support Kafka. It also allows Kafka to have a single security model for the whole system, along with enabling Kafka clusters to scale to millions of partitions through improved control plane performance with the new metadata management. The enhanced scalability also enables up to 10x improvement in recovery time for controlled shutdowns.

Support millions of partitions without significantly impacting recovery times

Support for Multi-Region Clusters/Observers or for upgrades from existing deployment is expected to be part of a future release. KIP-866 also adds the ability to migrate a Kafka cluster from ZooKeeper to KRaft mode as an early access feature. The migration copies the cluster metadata from ZooKeeper to the KRaft metadata log. During the migration, brokers are restarted in KRaft mode one at a time, allowing the whole migration process to happen without cluster downtime. As mentioned, this feature is early access in Apache Kafka 3.4 and should not be used in production yet.

Confluent for Kubernetes Blueprints

Confluent for Kubernetes (CFK) is the platform of choice for deploying and managing Confluent Platform on Kubernetes infrastructures in a cloud-native, declarative manner. Confluent Platform component resources are managed by CFK using Custom Resource Definition (CRD) constructs. CRDs span both infrastructure and application-specific resources.

Today, while client team’s using Confluent's central platform has technical knowledge to manage Confluent Platform, many application teams lack resources to interact with the platform effectively. The central platform team must integrate with various enterprise systems, requiring responsibility for infrastructure setup, best practices development, and intuitive self-service tooling. Application teams require robust self-service tooling to streamline their usage of Confluent.

That's why we've introduced Blueprints with Confluent for Kubernetes 2.6, which are a new set of higher-level abstracts that allow platform teams to define a set of standardized ways (prod, staging, dev, qa, 0-RPO-availability, etc.) to deploy Confluent Platform. Our single control plane architecture built into CFK allows you to manage deployments in a single Kubernetes cluster or across multiple Kubernetes clusters. This also enables a self-service developer interface that allows application teams to create deployments and app resources through a simplified API. Blueprints also provide simplified APIs to handle complex networking and security environments.

Platform and Application team can separately control and self-service their Confluent Platform experience.

The platform team utilizes Blueprints to deploy Confluent Platform, enabling the application team to interact solely with the application resources.

Additional benefits such as automation of credential management can be enabled using Blueprints. Automation around discovery of Confluent Platform components, updating certificates, and rolling clusters can be achieved via Blueprints. 

Abstraction of Credential Management Helps in Automation: Build CredentialStoreConfig on top of K8s secrets

To learn more, see our 5-minute video on Blueprints.

With CFK 2.6, customers can deploy and manage Confluent Platform on Kubernetes infrastructures with ease, while also benefiting from enhanced security, performance, and proactive support capabilities.

Data quality rules with Schema Registry

As organizations deliver against an expectation for “real-time everything,” it becomes important for engineering teams to trust the consistency, reliability, and quality of their data in motion. This drives the need for clear Data Quality Rules—a formal agreement between upstream and downstream components on the structure, semantics, and quality of data in motion. The upstream component enforces the Data Quality Rules, while the downstream component can assume that the data it receives conforms to the Data Quality Rules. 

To support the usage and management of Data Quality Rules in Confluent Platform, we’ve added data quality rules—alongside tags and metadata—to Schema Registry. With this release, we’ve added data quality rules for domain validation and schema migration, allowing developers and architects to easily validate important, sensitive data and/or easily move from old data formats to new ones. 

Domain validation rules

Domain validation rules are used to validate the value of a single field based on a boolean predicate. Plus, domain validation rules can have actions triggered on failure or on success of the rule.

Confluent Schema Registry supports domain validation rules execution based on Google Common Expression Language (CEL) that implements a common and simple semantics for expression evaluation

For example, let's say we want to validate that the "ssn" field is only 9 characters long before we serialize a message. We can use a schema with the rule below, where the message is passed to the rule as a variable with the name "message".

{
  "schema": "...",
  "ruleSet": {
    "domainRules": [
      {
        "name": "checkSsnLength",
        "kind": "CONDITION",
        "type": "CEL",
        "mode": "WRITE",
        "expr": "size(message.ssn) == 9"
       }
    ]
  }
}

Now that we've set up our domain validation rule, whenever a message is sent that has a "ssn" field that is not 9 characters in length, the serializer will throw an exception. Alternatively, we can have the message sent to a dead-letter queue topic via an action:

"schema": "...",
  "ruleSet": {
    "domainRules": [
       {
         "name": "checkSsnLength",
         "kind": "CONSTRAINT",
         "type": "CEL",
         "mode": "WRITE",
         "expr": "size(message.ssn) == 9",
         "onFailure": "DLQ"
       }
    ]
  }
}

When using conditions, one can model arbitrary Event-Condition-Action (ECA) rules by specifying an action for "onSuccess" to use when the condition is true, and an action for "onFailure" to use when the condition is false. The action type for "onSuccess" defaults to the built-in action type NONE, and the action type for "onError" defaults to the built-in action type ERROR. One can explicitly set these values to NONE, ERROR, DLQ, or a custom action type. For example, one might want to implement an action to send an email whenever an event indicates a credit card is about to expire. 

Schema migration rules

Complex schema migration rules are used to evolve a schema in an incompatible manner by applying transformations when consuming from a topic to translate the topic data from the old format to the new format. With this approach, the client does not need to be switched over to a different topic (the current alternative), but instead continues to read from the same topic, and a set of declarative migration rules will massage the data into the form that the consumer expects.

With the use of declarative migration rules, we can support a system in which producers separately use versions 1, 2, and 3 of a schema, which are all incompatible, and consumers that expect versions 1, 2, or 3 each see a message transformed to the desired version, regardless of what version the producer sent. 

For example, let’s say we want to rename the field "ssn" as "socialSecurityNumber", an incompatible schema evolution rule in any schema language we support. Below are a set of migration rules to achieve this. Note that we use the JSONata function called "sift" to remove the field with the old name, and then use a JSON property to specify the new field.

{
  "schema": "...",
  "ruleSet": {
    "migrationRules": [
      {
        "name": "changeSsnToSocialSecurityNumber",
        "kind": "TRANSFORM",
        "type": "JSONATA",
        "mode": "UPGRADE",
        "expr": "$merge([$sift($, function($v, $k) {$k != 'ssn'}), {'socialSecurityNumber': $.'ssn'}])"
       },
       {
        "name": "changeSocialSecurityNumberToSsn",
        "kind": "TRANSFORM",
        "type": "JSONATA",
        "mode": "DOWNGRADE",
        "expr": "$merge([$sift($, function($v, $k) {$k != 'socialSecurityNumber'}), {'ssn': $.'socialSecurityNumber'}])"
       }
    ]
  }
}

For further technical information, be on the lookout for an upcoming blog series on Data Quality Rules.

Built on Apache Kafka 3.4

Confluent Platform 7.4 is built on the most recent version of Apache Kafka, in this case, version 3.4. For more details about Apache Kafka 3.4, please read the blog post by Sophie Blee-Goldman or check out the video by Danica Fine below. 

Get started today

Download Confluent Platform 7.4 today to get started with the only cloud-native and complete platform for data in motion, built by the original creators of Apache Kafka.

The preceding outlines our general product direction and is not a commitment to deliver any material, code, or functionality. The development, release, timing, and pricing of any features or functionality described may change. Customers should make their purchase decisions based upon services, features, and functions that are currently available.

Confluent and associated marks are trademarks or registered trademarks of Confluent, Inc.

Apache® and Apache Kafka® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by the Apache Software Foundation is implied by the use of these marks. All other trademarks are the property of their respective owners.

  • Bharath Venkat is a product marketing manager at Confluent responsible for digital growth and Confluent Platform. Before joining Confluent, Bharath was a product management leader at AT&T, Cisco Appdynamics, Lacework, and Druva focussing on artificial intelligence, machine learning, data science and analytics, and growth initiatives.

  • Hasan Jilani is a staff product marketing manager at Confluent responsible for stream processing and Flink go-to-market. Prior to joining Confluent, Hasan led marketing for cloud and platform services at DXC Technology.

Avez-vous aimé cet article de blog ? Partagez-le !