Pulsar vs. Kafka: A Modern Messaging Deep Dive

28/4/2024
3-minute read

In the world of real-time data processing, choosing the right messaging system is a critical architectural decision. For years, Apache Kafka has been the dominant force, but a powerful contender, Apache Pulsar, has emerged with a different architectural approach that offers compelling advantages. This deep dive compares Pulsar and Kafka across key categories to help you decide which platform is right for your needs.

Core Architectural Difference

The most fundamental difference between Kafka and Pulsar lies in their architecture. Kafka uses a unified, partition-centric model where brokers are responsible for both storing data and serving clients. This tight coupling is simple and highly performant for sequential workloads. In contrast, Pulsar decouples compute from storage, featuring a stateless broker layer for serving traffic and a separate, scalable storage layer using Apache BookKeeper. This separation allows for independent scaling and provides significant operational flexibility, especially in cloud-native environments.

Feature	Apache Kafka	Apache Pulsar
Architecture	Unified compute & storage	Decoupled compute & storage
Storage	Broker-local disk storage	Apache BookKeeper (distributed log)
Multi-Tenancy	Limited (usually via separate clusters)	Native, built-in with namespaces & quotas
Messaging Models	Publish-Subscribe (Pub-Sub)	Pub-Sub & Queuing (Shared, Failover)
Geo-Replication	Requires MirrorMaker2; can be complex	Built-in, configurable at namespace level
Data Retention	Limited by broker disk; tiered storage is new	Virtually infinite with tiered storage to S3, etc.

Performance and Scalability

Both systems are designed for high performance, but their profiles differ. Kafka excels at high-throughput sequential reads and writes, making it ideal for analytics pipelines. Pulsar often demonstrates more consistent, lower p99 latency, suiting it for mission-critical services. The biggest operational difference is in scalability. To scale Kafka, you must add brokers and then manually rebalance data partitions, which can be a disruptive, I/O-intensive process. With Pulsar, you can add stateless brokers instantly to handle more traffic or add BookKeeper nodes seamlessly to increase storage capacity, with no data rebalancing required.

Use Case Suitability

Your choice should be driven by your specific workload and operational needs. Kafka is a proven and powerful choice for high-throughput streaming when operational simplicity at your current scale is sufficient. Pulsar is superior for multi-tenant environments, applications requiring both streaming and queuing, and for organizations that need to scale frequently and seamlessly without disruption. Below are simple producer examples for both.

Kafka Producer (Java):

Properties props = new Properties();
props.put("bootstrap.servers", "kafka-broker:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

Producer<String, String> producer = new KafkaProducer<>(props);
producer.send(new ProducerRecord<>("my-topic", "key", "hello world"));
producer.close();

Pulsar Producer (Java):

PulsarClient client = PulsarClient.builder()
        .serviceUrl("pulsar://pulsar-broker:6650")
        .build();

Producer<byte[]> producer = client.newProducer()
        .topic("persistent://public/default/my-topic")
        .create();

producer.send("hello world".getBytes());
producer.close();
client.close();

Who Uses Pulsar and Kafka?

Both platforms are trusted by major tech companies for mission-critical applications.

Apache Kafka is used by over 80% of the Fortune 100. Notable users include:

Netflix: For real-time data monitoring and event processing across its global platform.
LinkedIn: Where it was originally created, for activity stream data and operational metrics.
Uber: For logging, stream processing, and feeding its real-time marketplace.

Apache Pulsar has gained significant traction in companies requiring low latency and operational flexibility. Notable users include:

Yahoo! (now Verizon Media): Where it was created, to support applications like Yahoo! Mail and Flickr.
Tencent: For its massive billing and transaction platform.
Splunk: For its Data-in-Transit and real-time analytics products.

Conclusion

While Kafka remains a dominant force, Pulsar presents a compelling, modern alternative, especially for cloud-native and multi-tenant deployments. The choice is not about which is “better,” but which architecture best fits your specific use case. For maximum throughput in simpler streaming scenarios, Kafka is a proven choice. For operational flexibility, multi-tenancy, and low latency at scale, Pulsar’s decoupled architecture offers significant advantages.