Skip to content

Conversation

@lukashes
Copy link
Owner

image

Instead of blocking on flush() after every batch, now flush once per 10 seconds:
- Track pending LSN and last flush time in processor state
- poll() triggers network I/O without blocking
- flush() + LSN commit happen every 10 seconds
- Final flush on graceful shutdown ensures delivery

This should significantly improve throughput by letting librdkafka batch messages better.
Fixes PostgreSQL connection drops by sending LSN feedback every 10 seconds.
Matches Debezium's approach (status.update.interval.ms default).

Main changes:
- Background worker for flush/commit (every 10s)
- Atomic pending_lsn for thread-safe LSN tracking
- Remove sync flush from main processing loop
Simplified background flush logic with early returns.
Moved flush interval to constants.CDC for better configurability.
Skip sending feedback when LSN is uninitialized (0).
Async flush/commit improved throughput from 85k to 105k events/sec.
Now at 70% of Debezium's throughput with <10 MB memory footprint.
@lukashes lukashes marked this pull request as ready for review November 26, 2025 21:59
@lukashes lukashes merged commit c2ae6ff into main Nov 26, 2025
5 checks passed
@lukashes lukashes deleted the unblocking branch November 26, 2025 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants