Kafka Overview

Apache Kafka is a distributed event streaming platform used for high-throughput, fault-tolerant message processing. It combines messaging, storage, and stream processing.

Core Concepts

Concept	Description
Broker	Kafka server node
Topic	Named stream of records
Partition	Ordered, immutable sequence within a topic
Offset	Position of a record in a partition
Producer	Publishes records to topics
Consumer	Reads records from topics
Consumer Group	Coordinated consumers sharing load

Architecture

  Producer → Topic (Partition 0, 1, 2) → Consumer Group
              ↕ replication
           Broker 1, 2, 3

Each partition is replicated across multiple brokers for fault tolerance.

Key Properties

Property	Value
Throughput	Millions of messages/sec
Retention	Configurable (time or size)
Ordering	Guaranteed within a partition
Durability	Replicated to multiple brokers
Scalability	Horizontal (add brokers/partitions)

Topic Configuration

  # Create topic with 3 partitions, replication factor 2
kafka-topics.sh --create \
  --bootstrap-server localhost:9092 \
  --topic orders \
  --partitions 3 \
  --replication-factor 2

# Describe topic
kafka-topics.sh --describe --topic orders --bootstrap-server localhost:9092

Partition Strategy

Partition choice affects ordering and parallelism:

  Key-based:  same key → same partition → ordering guaranteed
Round-robin: no key → even distribution, no ordering guarantee
Custom:     implement Partitioner interface

Replication

  Partition 0:  Leader (Broker 1) → Follower (Broker 2), Follower (Broker 3)
Partition 1:  Leader (Broker 2) → Follower (Broker 1), Follower (Broker 3)

Leader handles all reads/writes
Followers replicate from leader
ISR (In-Sync Replicas): followers caught up with leader

Kafka vs Traditional MQ

Feature	Kafka	RabbitMQ
Model	Log-based	Queue/exchange
Message retention	Configurable (days)	Deleted after ack
Throughput	Very high	High
Ordering	Per partition	Per queue
Replay	Yes (by offset)	No
Best for	Event streaming, logs	Task queues, RPC

Java Dependency

  <dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>3.6.1</version>
</dependency>

Best Practices

Size partitions based on target throughput (not too many — overhead)
Use replication factor ≥ 3 in production
Choose partition keys to preserve ordering where needed
Monitor consumer lag — growing lag indicates processing bottleneck
Set appropriate retention based on replay requirements

JMS Producer and Consumer

Kafka Producers and Consumers

Kafka Overview

Core Concepts link

Architecture link

Key Properties link

Topic Configuration link

Partition Strategy link

Replication link

Kafka vs Traditional MQ link

Java Dependency link

Best Practices link