A backpressure-aware, rebalance-safe Kafka consumer that decouples polling from processing and commits offsets only after downstream completion.
A production-grade Kafka consumer architecture designed to eliminate consumer lag, rebalance storms, and offset inconsistencies under high-throughput workloads.
This project demonstrates correct Kafka protocol usage, explicit backpressure, and safe concurrency design, going far beyond naïve consumer implementations.
In high-throughput Kafka systems, consumer lag often grows even when brokers are healthy and multiple consumers are running. Common root causes include:
- Poll loop blocked by slow downstream processing
- Unbounded in-memory queues
- Unsafe offset commits
- Frequent rebalance storms
- In-flight records lost during partition reassignments
Scaling consumers or adding threads typically does not solve the problem and often makes it worse.
- Keep the Kafka poll loop non-blocking
- Apply explicit backpressure
- Commit offsets only after successful processing
- Prevent rebalance storms
- Handle rebalances safely and deterministically
- Preserve at-least-once delivery guarantees
- Polling, processing, and committing in the same thread
- Slow processing blocks
poll() - Consumer group instability
- Lag accumulates on hot partitions
- Scaling yields diminishing returns
- Polling decoupled from processing
- Bounded queue with backpressure
- Parallel worker pool
- Manual offset tracking and commit
- Partition pause / resume
- Rebalance-safe draining
This redesign restores predictable scaling and stability.
- Only one KafkaConsumer instance
- Only the poll thread interacts with Kafka APIs
- Worker threads are Kafka-agnostic
- Offsets are committed after processing, not on poll
- Polls records
- Applies pause/resume
- Commits offsets
- Handles rebalance callbacks
- Fixed capacity
- Enforces backpressure
- Protects poll loop from downstream slowness
- Parallel processing
- CPU / IO heavy work
- No Kafka access
- Tracks the highest processed offset per partition
- Supports rebalance-safe commits
- Pauses intake on revoke
- Drains in-flight records
- Commits offsets safely
- Resumes on assignment
When downstream pressure increases:
- Queue depth exceeds threshold
- Poll thread pauses assigned partitions
- Workers drain in-flight records
- Queue depth drops
- Poll thread resumes partitions
This prevents poll starvation and rebalance storms.
During rebalances:
- Intake is paused
- In-flight work is drained
- Offsets for revoked partitions are committed
- New partitions resume cleanly
This ensures:
- No offset loss
- No commit failures
- Stable group membership
- At-least-once delivery
- No message loss
- Controlled duplicates (downstream idempotency expected)
- Stable consumer group behavior
- Slow processing
- Burst traffic
- Queue saturation
- Consumer restart mid-processing
- Rebalance during load
In all scenarios:
- Lag stabilized
- No rebalance storms
- Correct offset progression
docker compose up -d
Run the Consumer
mvn clean spring-boot:run
Produce Messages
docker exec -it kafka kafka-console-producer \
--topic orders \
--bootstrap-server localhost:9092