-
Notifications
You must be signed in to change notification settings - Fork 0
feat: request deduplication / response coalescing #126
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Problem
When Claude Code sends overlapping requests (e.g., multiple tool calls hitting the same model with identical payloads), each request independently hits the upstream provider. This wastes rate limit quota, tokens, and latency.
Proposal
Detect identical request payloads within a short time window and share the upstream response:
- Hash the request body (excluding
modelfield) to create a dedup key - If a request with the same key is already in-flight, attach the new client as a second subscriber
- When the upstream response arrives, stream it to all subscribers
- TTL-based cache for very recent identical requests (e.g., 1-2s window)
Important considerations:
- Must handle streaming responses (clone the PassThrough stream for each subscriber)
- Should be opt-in via config to avoid surprises with non-idempotent side effects
- Cache size must be bounded
Expected Benefit
- Reduced upstream API costs (fewer duplicate requests)
- Lower latency for duplicate requests (piggyback on in-flight response)
- Better rate limit utilization
Related
Part of stability & performance exploration vs direct Claude Code CLI → LLM gateway connections.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request