feat: request deduplication / response coalescing

## Problem

When Claude Code sends overlapping requests (e.g., multiple tool calls hitting the same model with identical payloads), each request independently hits the upstream provider. This wastes rate limit quota, tokens, and latency.

## Proposal

Detect identical request payloads within a short time window and share the upstream response:

1. Hash the request body (excluding `model` field) to create a dedup key
2. If a request with the same key is already in-flight, attach the new client as a second subscriber
3. When the upstream response arrives, stream it to all subscribers
4. TTL-based cache for very recent identical requests (e.g., 1-2s window)

Important considerations:
- Must handle streaming responses (clone the PassThrough stream for each subscriber)
- Should be opt-in via config to avoid surprises with non-idempotent side effects
- Cache size must be bounded

## Expected Benefit

- Reduced upstream API costs (fewer duplicate requests)
- Lower latency for duplicate requests (piggyback on in-flight response)
- Better rate limit utilization

## Related

Part of stability & performance exploration vs direct Claude Code CLI → LLM gateway connections.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: request deduplication / response coalescing #126

Problem

Proposal

Expected Benefit

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: request deduplication / response coalescing #126

Description

Problem

Proposal

Expected Benefit

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions