On this page
Kafka Overview
Apache Kafka is a distributed event streaming platform used for high-throughput, fault-tolerant message processing. It combines messaging, storage, and stream processing.
Core Concepts
| Concept | Description |
|---|---|
| Broker | Kafka server node |
| Topic | Named stream of records |
| Partition | Ordered, immutable sequence within a topic |
| Offset | Position of a record in a partition |
| Producer | Publishes records to topics |
| Consumer | Reads records from topics |
| Consumer Group | Coordinated consumers sharing load |
Architecture
Producer → Topic (Partition 0, 1, 2) → Consumer Group
↕ replication
Broker 1, 2, 3
Each partition is replicated across multiple brokers for fault tolerance.
Key Properties
| Property | Value |
|---|---|
| Throughput | Millions of messages/sec |
| Retention | Configurable (time or size) |
| Ordering | Guaranteed within a partition |
| Durability | Replicated to multiple brokers |
| Scalability | Horizontal (add brokers/partitions) |
Topic Configuration
# Create topic with 3 partitions, replication factor 2
kafka-topics.sh --create \
--bootstrap-server localhost:9092 \
--topic orders \
--partitions 3 \
--replication-factor 2
# Describe topic
kafka-topics.sh --describe --topic orders --bootstrap-server localhost:9092
Partition Strategy
Partition choice affects ordering and parallelism:
Key-based: same key → same partition → ordering guaranteed
Round-robin: no key → even distribution, no ordering guarantee
Custom: implement Partitioner interface
Replication
Partition 0: Leader (Broker 1) → Follower (Broker 2), Follower (Broker 3)
Partition 1: Leader (Broker 2) → Follower (Broker 1), Follower (Broker 3)
- Leader handles all reads/writes
- Followers replicate from leader
- ISR (In-Sync Replicas): followers caught up with leader
Kafka vs Traditional MQ
| Feature | Kafka | RabbitMQ |
|---|---|---|
| Model | Log-based | Queue/exchange |
| Message retention | Configurable (days) | Deleted after ack |
| Throughput | Very high | High |
| Ordering | Per partition | Per queue |
| Replay | Yes (by offset) | No |
| Best for | Event streaming, logs | Task queues, RPC |
Java Dependency
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.6.1</version>
</dependency>
Best Practices
- Size partitions based on target throughput (not too many — overhead)
- Use replication factor ≥ 3 in production
- Choose partition keys to preserve ordering where needed
- Monitor consumer lag — growing lag indicates processing bottleneck
- Set appropriate retention based on replay requirements