Hypothesis: CLAUDE_CODE_ATTRIBUTION_HEADER=false restores cross-session prompt cache sharing by removing the per-session dynamic hash from system prompt prefix.
Method:
- Round A (control): 4 sessions with default settings (billing header ON)
- Round B (treatment): 4 sessions with
CLAUDE_CODE_ATTRIBUTION_HEADER=false - Each session sends a DIFFERENT prompt (simulating real usage)
- All sessions use
claude --printmode (single-shot, non-interactive) - Sessions run sequentially with 8s gap (within 5-min cache TTL)
- Metric:
cache_read_input_tokenson the FIRST API request of each session
Key expectation: In Round A, each session should show partial cache_read (tools block cached, system prompt rebuilt due to header hash change). In Round B, sessions 2+ should show full cache_read (both tools and system prompt served from cache).
| Session | Prompt | cache_read | cache_creation | input_tokens | hit_ratio |
|---|---|---|---|---|---|
| 1 | What is the capital of France? Answer in... | 11272 | 12206 | 3 | .4800 |
| 2 | List three prime numbers under 20.... | 11272 | 12202 | 3 | .4801 |
| 3 | Explain what a goroutine is in one sente... | 11374 | 12753 | 3 | .4713 |
| 4 | Name one benefit of TypeScript over Java... | 11272 | 12203 | 3 | .4801 |
| Session | Prompt | cache_read | cache_creation | input_tokens | hit_ratio |
|---|---|---|---|---|---|
| 1 | What is the capital of France? Answer in... | 23478 | 0 | 3 | .9998 |
| 2 | List three prime numbers under 20.... | 23474 | 0 | 3 | .9998 |
| 3 | Explain what a goroutine is in one sente... | 11272 | 12197 | 3 | .4802 |
| 4 | Name one benefit of TypeScript over Java... | 23475 | 0 | 3 | .9998 |
| Session | Round A cache_read | Round A cache_creation | Round B cache_read | Round B cache_creation | Delta cache_read |
|---|---|---|---|---|---|
| 1 | 11,272 | 12,206 | 23,478 | 0 | +12,206 (+108%) |
| 2 | 11,272 | 12,202 | 23,474 | 0 | +12,202 (+108%) |
| 3 | 11,374 | 12,753 | 11,272 | 12,197 | -102 (anomaly) |
| 4 | 11,272 | 12,203 | 23,475 | 0 | +12,203 (+108%) |
Total prompt size (tools + system): ~23,478 tokens (confirmed by Round B full-cache sessions).
1. Hypothesis CONFIRMED — billing header prevents ~52% of prompt cache sharing
The data reveals a two-tier caching pattern, consistent with the Anthropic API's cache hierarchy (tools → system → messages):
- Tools block (~11,272 tokens): Tool definitions are stable across sessions — always cached regardless of header setting
- System prompt block (~12,200 tokens): Contains the billing header with a per-session hash — re-created every session when header is ON, because the hash changes the cache key for this block
With CLAUDE_CODE_ATTRIBUTION_HEADER=false, both blocks are fully cacheable → 99.98% hit ratio.
2. Round A (header ON): Consistent ~48% cache hit ratio
Every session shows the same pattern: ~11,272 cache_read (tools block) + ~12,200 cache_creation (system prompt block). The billing header hash changes per session, preventing the system prompt from being served from cache.
3. Round B (header OFF): Near-perfect cache sharing (3/4 sessions)
Sessions 1, 2, 4 achieved ~23,475 cache_read with 0 cache_creation — the entire prompt (tools + system) is served from cache.
4. Anomalous Session B-3
Session B-3 showed the Round A pattern (11,272 read + 12,197 creation) instead of full cache hits. Possible causes:
- Server-side cache eviction between B-2 and B-3 (8s gap, within 5-min TTL but eviction is unpredictable)
- Session B-3's JSONL resolved to a different project directory than the other sessions
- Note: B-4 achieved full cache again (23,475), confirming B-3's cache_creation was successfully reused
5. Cost Impact Estimate
Using Anthropic pricing (cache_read = 0.1x base, cache_creation = 1.25x base):
| Metric | Round A (per session) | Round B (per session, typical) |
|---|---|---|
| Effective input cost | 11,272×0.1 + 12,200×1.25 + 3 = ~16,380 equiv tokens | 23,475×0.1 + 0 + 3 = ~2,351 equiv tokens |
| Savings | — | ~85% reduction (~7x cheaper) |
Note: This comparison measures system prompt cost only. For long conversations, user messages and tool results dominate the total token count, so the relative savings decrease as conversations grow.
The billing header (CLAUDE_CODE_ATTRIBUTION_HEADER) prevents cross-session prompt cache sharing for the system prompt block (~12K tokens). Setting CLAUDE_CODE_ATTRIBUTION_HEADER=false restores full cache sharing, reducing per-session system prompt costs by ~85%.
This is particularly impactful for:
- Multi-session workflows — running several Claude Code instances on the same project in parallel
- Rapid iteration — frequent short sessions that repeatedly pay cache_creation cost
- CI/CD and automation — multiple
claude --printcalls in sequence (pipelines, batch processing)
Recommendation: Set CLAUDE_CODE_ATTRIBUTION_HEADER=false in environments where cross-session cache efficiency matters.
- Tested with
claude --printmode only. Interactive mode uses a different (larger) system prompt — the absolute numbers will differ, but the mechanism is the same. - Sample size is n=4 per round. The pattern is clear (3/4 full cache hit vs 0/4), but larger samples would strengthen the statistical confidence.
- Server-side cache eviction is non-deterministic — individual runs may show variance (as seen in session B-3).
- Claude Code version: 2.1.88 (Claude Code)
- Test mode:
claude --print(single-shot, non-interactive) - Date: 2026-03-31 08:23:21 UTC
- Platform: Darwin arm64
- Node: v22.14.0
See results/round-a/ and results/round-b/ for per-session metrics JSON files.