Développez l'apprentissage automatique prédictif avec Flink | Atelier du 18 déc. | S'inscrire

Queues in Apache Kafka®: Enhancing Message Processing and Scalability

Écrit par

In the world of data processing and messaging systems, terms like "queue" and "streaming" often pop up. While they might sound similar, they serve different purposes, and can significantly impact how your system handles data. Let’s break down the differences in a straightforward way.

What are message queues?

Imagine a coffee shop where customers place orders either online or in person. Once an order is processed, the customer is notified (through an app or an in-person announcement) to pick it up. In this scenario, the orders function like messages in a queue, and the barista processes them one at a time, removing each order from the queue after it’s completed. This is essentially how a message queue works. Each message is a separate piece of work to be performed independently. Each message in the queue waits its turn to be processed by a consumer. Consumption of messages from queues is destructive in nature, meaning that as a message is consumed, it's deleted from the queue. This approach simplifies coordination among the consumers and ensures that the consumers can easily process disjoint sets of messages.

Key characteristics of a queue:

  • Asynchronous communication: In our coffee shop example, you don’t need to stand there and watch your coffee being made. You can carry on with your day while waiting for your order to be ready. Similarly, message queues allow producers (like your coffee ordering app) to send messages via the queue without needing the consumers (baristas) to be ready at the same time.

  • First In, First Out (FIFO): Just like the order in which you place your orders matters, message queues typically follow a FIFO pattern. The first message sent to the queue is the first one to be processed. This is crucial for applications that rely on the order of operations, like processing transactions in a banking system. There are shared queues where the ordering is not guaranteed. We will talk about that later in the blog.

  • Durability: Most message queues ensure that messages are stored reliably. Even if the system crashes or a consumer goes offline, the messages will wait patiently in the queue until they can be processed.

  • Message deletion: Messages are stored in the queue until a consumer processes them. Each message is typically consumed by one consumer instance, ensuring exclusive delivery. Messages are deleted after being consumed (acknowledged) by a consumer. This ensures that no duplicate processing occurs, and the queue is cleaned up over time.

Queues are especially beneficial in scenarios where parallel processing is needed, as they enable multiple consumers to read and process messages concurrently, and help in scalability of the system. This makes them ideal for load balancing and managing tasks that can be processed independently of the order in which they were received. So, there are workloads that make a great case for queue-based processing. Some examples are inventory management systems in retail, healthcare management systems for patient flow, and managing customers in a restaurant. 

What about streaming messages?

Now, let’s switch gears and talk about streaming messages. Picture a live concert where the music keeps flowing, and everyone is enjoying the vibe. Streaming messages are all about continuous flow and real-time processing.

Key characteristics of streaming messages:

  • Real-time processing: Unlike our coffee shop scenario, streaming messages are designed for immediate consumption. Think of it as listening to music on a streaming service. You get instant access to songs as they’re played without waiting for the entire album to download.

  • Event-driven architecture: Streaming is all about reacting to events as they happen. If you’ve ever used a social media platform, you’ve experienced streaming in action. Your feed updates in real time, showing you the latest posts, likes, and comments. This is how streaming messages work—pushing data to consumers as soon as it’s available.

  • Scalability: Streaming systems can handle a massive volume of data, processing thousands of messages per second. They’re perfect for applications like real-time analytics, monitoring, and machine learning.

  • Message retention: Messages are stored and can be consumed by multiple consumers, enabling both real-time and batch processing. Consumers track their progress (via offsets), allowing flexible replay of messages. Messages can be deleted based on retention policies, such as time-based (e.g., retain for 7 days) or size-based limits (e.g., retain up to 1GB per partition). Consumers do not trigger message deletion; rather, data is retained for historical access or reprocessing.

Streaming is used in almost every aspect of life nowadays. You see streaming in place for stock prices, monitoring, fraud detection, retail, customer service, and sales. And Apache Kafka has been the de facto standard for streaming for many years as businesses increasingly rely on Kafka for mission-critical applications. 

Why queues for Apache Kafka?

At Confluent, we envision Kafka becoming the central nervous system for all customer workloads, without them having to worry about being locked into a proprietary ecosystem. To support that, we've been expanding our list of open standards to boost adoption. Depending on the workload, users may want to consume the messages in an orderly fashion or in other cases, they want to process them fast, irrespective of the order. That’s why we’re excited to bring queue support to Kafka, making it a one-stop solution where consumers have flexibility in processing them concurrently. With this addition, Kafka can now handle an even wider range of use cases, giving customers more flexibility in managing their data workflows. 

How are queues supported in Apache Kafka?

Let’s take a closer look. Kafka stores messages in an append-only log format, where each message is assigned an offset. Consumers read these messages sequentially from these offsets, making it easy to replay messages and ensure fault tolerance. Thanks to Kafka's log-based architecture, messages can stick around for a set period of time, allowing for reprocessing or recovery if something goes wrong. This makes Kafka a great fit for apps that need strict ordering and consistency, like event sourcing or auditing.

Figure 1: Kafka consumers reading from an append-only log

Now, we’re bringing the best of traditional queues into the mix. With Kafka’s hybrid model, messages can be consumed in parallel and processed at least once (non-destructive reads), while still giving you the option to replay messages from the log when necessary. This setup makes Kafka even more flexible, handling high throughput and out-of-order processing, without losing the reliability and resilience Kafka is known for.

Consumer groups and share groups in Apache Kafka

In Kafka, consumer groups are the key to coordinating how data is consumed from topics. Each consumer group is made up of multiple consumers that work together to read from a topic’s partitions. There’s a 1:1 relationship between partitions and consumers in a group, meaning the number of consumers is limited by the number of partitions. As you know, the Kafka consumer works by issuing “fetch” requests to the brokers leading the partitions it wants to consume. The consumer offset in the log is specified with each request. The consumer receives a chunk of log that contains all of the messages in that topic beginning from the offset position. The consumer has significant control over this position and can rewind it to re-consume data if desired.

A consumer group is a set of consumers that cooperate to consume data from some topics. You establish the group for a consumer by setting its group.id in the properties for the consumer. The partitions of all the topics are divided among the consumers in the group.

This design offers both order and scalability, but it also ties the number of consumers directly to the number of partitions. To handle peak loads, many Kafka users create more partitions than they actually need, which can be inefficient and frustrating.

This is where share groups come in. 

In some cases, consumers need to work together without being tied to specific partitions. Share groups introduce a more flexible way for consumers to cooperate, especially in use cases that feel more like a traditional queue system. While share groups don’t make Kafka a full-on queue solution, they do offer very similar behavior, suitable for applications that traditionally use queues. They enable cooperative consumption for these scenarios. Essentially, share groups act like a "durable shared subscription" in other messaging systems, allowing multiple consumers to process records from the same partitions. They are an alternative to consumer groups for situations in which finer-grained sharing is required. Here’s what makes share groups different from traditional consumer groups:

  • Consumers in a share group can read from the same partition, unlike in consumer groups where partitions are exclusively assigned.

  • You can have more consumers in a share group than there are partitions.

  • Records are acknowledged one by one, though Kafka still optimizes for batch processing.

  • Kafka tracks delivery attempts and allows consumers to return unprocessed messages back to the queue, enabling other consumers to process them automatically.

Share groups add a new layer of flexibility while still supporting the publish/subscribe model. Consumers in different share groups can read from the same topic without affecting each other, which makes them more powerful than a simple queue.

In practice, consumers in a share group usually subscribe to the same topics, but each consumer can set their own list of topics if needed. This makes share groups a great alternative when you need finer control over how records are shared and processed in Kafka.

Figure 2: Share group compared to a consumer group

Does the share group guarantee ordering?

Well, the answer is not yet. Here’s how it works: within a batch of records from a particular share-partition, the records are guaranteed to be in order by increasing offset. But when it comes to ordering between different batches, there’s no such guarantee.

For example, if two consumers in a share group are consuming from the same partition, one might fetch records 100 to 109 and crash before processing them. Meanwhile, the other consumer fetches and processes records 110 to 119. When the first set of records (100 to 109) is re-delivered, the delivery count will be updated, but they’ll arrive after records 110 to 119.

This is actually what you want—it ensures that no records are lost and everything gets processed—but it means the offsets won’t always strictly increase like they would in a traditional consumer group. So, while you get ordering within a batch, you may see out-of-order records when redelivery happens across batches.

Real-life use case – sales event

Let’s look at a real-world scenario where Kafka queues, via share groups, can shine. Picture a retailer running a huge sales event for a popular product. The app handling the checkout process—letting customers add items to their carts and place orders—needs to scale up quickly as more customers flood in during the event.

With share groups, the customer orders are treated as individual units of work that can be processed by multiple workers in parallel. This means the app can dynamically add more consumers to share the load when demand spikes, ensuring that all customer orders are processed efficiently without slowing down. During quieter periods, the app can reduce the number of consumers, saving resources while still making sure orders are processed smoothly. Here the order (aka sequence) in which the customer orders are processed is not important as long as the orders are processed efficiently. 

This kind of flexibility allows the system to handle the ups and downs of traffic with ease, all without the need to repartition Kafka topics. It's a perfect fit for scenarios like sales events, where workload can change drastically in a short period of time.

Conclusion

Adding queue support to Kafka opens up a world of new possibilities for users, making Kafka even more versatile. By combining the strengths of traditional queue systems with Kafka’s robust log-based architecture, users now have a powerful solution that handles both streaming and queue processing in a single platform. This flexibility reduces complexity, allowing businesses to meet diverse data processing needs without being locked into multiple systems. With queue support, Kafka becomes a one-stop shop for real-time data movement, providing the reliability, scalability, and performance that organizations need to thrive in an increasingly data-driven world. This feature will be available as an Early Access release as a part of Apache Kafka 4.0 (release plan).

Apache Kafka® and Kafka are trademarks of the Apache Software Foundation.

  • Arun Singhal is a Director of Engineering at Confluent, focused on open-source Apache Kafka. Previously, he led the Kafka Developer Platform organization at Confluent, driving initiatives to enhance Kafka availability, improve developer productivity, and Kafka clients.

Avez-vous aimé cet article de blog ? Partagez-le !