-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Background
On Jan 9-10, 2026, the KOI pipeline experienced 57,116 false "content changed" events in 24 hours due to a bug in the GitHub sensor (fixed in regen-network/koi-sensors@b2b547e). This overwhelmed downstream services (BGE Server, OpenAI API) causing 429 rate limit errors.
Problem
The coordinator currently broadcasts all events immediately without any rate limiting. When a sensor malfunctions or a bulk re-index occurs, this can flood downstream services.
Proposed Solution
Add configurable rate limiting at the coordinator level:
# Example config
RATE_LIMIT_EVENTS_PER_MINUTE = 100
RATE_LIMIT_BURST = 50Options to consider:
- Token bucket - Allow bursts but enforce average rate
- Sliding window - Hard limit per time window
- Per-source limits - Different limits per sensor type
Implementation Considerations
- Should excess events be queued or dropped?
- Should rate limits be per-source or global?
- Need backpressure mechanism to slow down sensors?
- Config via environment variables or config file?
Related
- Event flood detection added in c3c0366
- BGE Server retry logic added in 18b197c
- Root cause (GitHub sensor RID bug) fixed in koi-sensors@b2b547e
Labels
enhancement, resilience
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels