Apache Kafka: Practical Applications and Limitations

Shaikh N
Oct 18, 2023
3 min read

Apache Kafka has become one of the most popular open-source distributed streaming platforms over the last few years. But like any technology, Kafka has its strengths and weaknesses.

In this post, we'll walk through some of the most common uses for Kafka and where it excels, as well as some of its limitations to keep in mind.

Streaming Data Pipelines

One of the most common use cases for Kafka is building real-time streaming data pipelines. Kafka provides a durable and scalable publish-subscribe messaging system that you can use to ingest data from many different sources, land it into Kafka topics, and then stream that data to various downstream systems and applications.

This makes it easy to decouple your data producers from data consumers. For example, you could ingest log files or database changes into Kafka, and then stream that data to Hadoop for analytics, search indexes like Elasticsearch for full-text search, and other systems to trigger alerts or send notifications.

The publish-subscribe model works extremely well for streaming pipelines, as producers of data don't need to know anything about the consumers of that data. Kafka handles distributing the data to all subscribed consumers.

Event Sourcing

Event sourcing is an architectural pattern where you store the full series of events that describe actions taken on data, rather than just the current state. For example, rather than storing just the current name and email address for a customer, you'd store the entire history of name and email changes.

Kafka's durable publish-subscribe messaging model makes it a great fit as an event store for event sourcing. Services can publish domain events describing state changes to Kafka topics. Other applications can subscribe to those topics and reconstruct the current state by replaying the events.

This makes it easy to go back in time and reconstruct past states by replaying the event history. Kafka's log compaction feature helps by allowing you to retain just the latest value for a given key, rather than the full history.

Stream Processing

In addition to landing streaming data into storage systems, Kafka is often used for real-time stream processing. The Kafka Streams API allows you to write stream processing applications that consume from Kafka topics, run computations and aggregations on the incoming data stream, and output results back to Kafka.

This makes it easy to filter, transform, and enrich real-time data as it arrives. And by leveraging Kafka for the underlying storage and message transport, these stream processing applications can access data from all topics within a Kafka cluster. Common examples include fraud detection, transaction monitoring, and real-time recommendation engines.

Limitations to Keep in Mind

While Kafka is versatile and great for many streaming data use cases, it does have some limitations to keep in mind:

Not a full database - Kafka is sometimes described as a "distributed commit log". It works well for linear event data, but lacks the query capabilities of a full database. Kafka is more for landing and transporting streams of data.
No built-in application semantics - Kafka ensures messages are delivered at least once and maintains topic offsets and basic ordering guarantees. But it does not track or manage application-level notions of state. Your apps need to handle this.
No automatic data retention - Kafka will retain data forever by default. Your applications need to manage retention and compaction policies if storage limits are a concern.
Ordering guarantees per-partition - Ordering is guaranteed only within a given partition, not across an entire topic. If order across the full stream is important, you may need additional coordination.
Not suitable for small data - Kafka has a lot of overhead and is meant for high-volume streams and/or large messages. For small data volumes, a simple message queue may be sufficient.

Conclusion

The benefits clearly outweigh the limitations for many streaming data use cases. Just be aware of Kafka's sweet spots so you can determine if it's a good fit for your needs. If you keep Kafka's strengths and limitations in mind, it can be an invaluable tool for building scalable and reliable streaming data architectures.