What is pub sub and Kafka
Pub Sub is a well known and popular system that can be integrated into almost any application. It enables servers to broadcast data in real time, which is used for many things like notifications, updates, or exchanges of information. Kafka on the other hand is a distributed log system that records continuous stream of events. It is built on top of distributed transactional queues, which allows it to handle hundreds of thousands of messages per second. Since Kafka continuously batches and replicates log files, it is able to overcome the performance and reliability problems that have plagued traditional messaging systems.
What is pub sub and Kafka
Pub Sub (publish-subscribe) systems are used to send data to the right person at the right time. It makes it easy to build reliable, scalable, elastic applications. Pub Sub systems enable consumers (subscribers) and producers (publishers) to communicate via an intermediary known as a message broker. The key difference between a traditional messaging system and a pub sub system is that in pub sub systems, publishers do not send messages directly to subscribers. Instead, they send messages to the message broker which then forwards them to the intended subscribers. This process enables subscribers to be dynamic and flexible. They can subscribe to channels (subscriptions) at any time and unsubscribe from the channel at any time. For example, Twitter is a popular pub sub system. Twitter users can follow or un-follow each other within seconds.
How do they compare
Kafka is built on top of Apache Zookeeper, and it maintains two aspects of the system: a distributed log implementation and a fully-distributed, real-time publish/subscribe messaging layer. The log is stored in Kafka’s permanent storage area and updated in real time. When you publish messages to Kafka, the messages are copied from their local store to an append only log file. And Kafka’s replication factor replicates those logs across multiple machines with different configurations. The messages reach consumers in real time so they can be processed. Kafka also automatically partitions and orders messages based on the properties of each record, which makes it much easier to search for new records. 2.1. Pub Sub The first highly available Kafka instance was written in Scala, and later Kafka was implemented in Java and non-blocking Java one was developed later by Baeza and coworkers. Pub Sub uses the publish/subscribe messaging pattern that enables applications to send events or messages to a group of servers (publishers). This means that a developer can send events to one or more consumers. Kafka’s replication factor provides you with high availability and fault tolerance.
Kafka is used as a streaming data source and destination for Stream processing, Hadoop, Spark, Storm and other engines. A developer can use Kafka’s topic partitioning to determine what data is stored in which partition so that it can be consumed by the appropriate data stream engine or database engine. 2.3. Caching
Which one should you use for your business
So it seems like there is no clear choice when deciding between Kafka and Pub Sub, therefore you should use the one that best fits your needs. Since both are designed for different purposes, you have to consider your application first. If you know what kind of real-time data processing that real-time data stream would provide and how much throughput that your application will require, then you can easily compare each of them. You can also use them side by side to see which one is better.
When it comes to streaming data, Kafka is a publish-subscribe based messaging system, built over a distributed hash table. It is on the center of big data stream processing for many companies since it offers high throughput and scalability. It has been widely adopted by many companies because of its real-time, fault-tolerant and high performance features.
On the other hand, Pub Sub is a message queueing system that supports both one-to-one and one-to-many communication patterns. It is basically a message bus system that can act as a messaging store for distributed systems. Its main purpose is to standardize structured communication between different parts of the application.
Kafka: high throughput and scalability
Pros and cons of each
Kafka comes with a built-in partitioning function and can easily be configured to handle high volumes of messages. Also, it provides a very intuitive way of monitoring log messages and finding new messages in the log quickly. Kafka can be run on any Java Virtual Machine (JVM) and has been tested on all major architectures i.e. multicore and single thread machines, which is rare for streaming systems. However, it requires a significant amount of memory allocation (up to 2 GB), which is difficult to maintain when it is being scaled. It also uses an append only log file system to store logs, which makes it easy to search through the messages, but can aggravate performance problems under high volumes. Kafka also has fairly high latency per message and is not suited for real time processing.
I like both distributed systems, Kafka and Pub Sub. I prefer Kafka as it is more robust, scalable and highly available. Kafka is also a distributed messaging system that allows you to quickly process numerous messages in real time. However, the system will only be beneficial if the log messages are used for something. You should plan your use cases before choosing a specific tool for your business. You can use both of them to see which one works better for you.
Google Pub Sub and Kafka are a great tool for large scale distributed real-time processing projects. They can be used to create powerful systems, which can handle a large amount of data. Both are available for free and you can use them to achieve very similar results. One thing Google Pub Sub has that Kafka doesn’t is support of an HTTP protocol, which makes it easier to integrate with existing systems, such as Apache web server. If you are starting a project, don’t rush with choosing the right tool. Start with something and then continue down to the next.
The article is a one-off comment on a blog post, so I will not keep this blog updated in future.
Thanks for reading! 🙂