-
Notifications
You must be signed in to change notification settings - Fork 18
⚡ Claude Token Optimization2026-04-02 — smoke-claude #1627
Description
Target Workflow: smoke-claude
Source report: #1606
Estimated cost per run: ~$0.24 (range: $0.18–$0.30)
Total tokens per run: ~203K
Cache read rate: 76.2% (excellent)
Cache write rate: 22.9% ← dominant cost at 73% of spend
LLM turns: ~6 (4 × Sonnet + 2 × Haiku routing)
Run frequency: 21 runs over the last 24 hours (every PR + schedule)
Current Configuration
| Setting | Value |
|---|---|
| Tools loaded | 5 (cache-memory, github, playwright, edit, bash) |
| Tools actually used | 4 (github, playwright, bash, safe-outputs) |
| Network groups | defaults, github, playwright |
| Shared imports | shared/mcp-pagination.md (3,225 bytes, ~810 tokens) |
| Pre-agent steps | No |
| Prompt size | smoke-claude.md: 3,531 bytes + mcp-pagination.md: 3,225 bytes |
max-turns |
15 (actual runs use ~6) |
edit: is loaded but never called — the file-creation test uses bash echo, not the edit tool.
cache-memory: is loaded but not needed — cache memory is designed for multi-session persistent memory in long-running agents. A 6-turn smoke test does not benefit from it.
Recommendations
1. Remove edit: tool — not used by the smoke test
The smoke test creates a file using bash echo. The edit: tool is never called.
Estimated savings: ~600 tokens/run from cache writes (~1.3%)
Change in .github/workflows/smoke-claude.md:
tools:
cache-memory: true
github:
toolsets: [repos, pull_requests]
playwright:
allowed_domains:
- github.com
- edit:
bash:
- "*"Cost savings: ~$0.00225/run × 630 runs/month ≈ $1.42/month
2. Remove cache-memory: true — irrelevant for a short smoke test
Cache memory persists learned facts across agent sessions. A 6-turn smoke test that runs then exits has no use for cross-session memory. Removing it eliminates:
- The
cache_memory_prompt.mdframework injection from the system prompt (~2,000–3,000 tokens) - The cache-memory tool schema (~600 tokens)
Estimated savings: ~2,500 tokens/run from cache writes (~5.4%)
Change in .github/workflows/smoke-claude.md:
tools:
- cache-memory: true
github:
toolsets: [repos, pull_requests]Cost savings: ~$0.009/run × 630 runs/month ≈ $5.91/month
3. Remove imports: shared/mcp-pagination.md (or inline a one-liner)
mcp-pagination.md is 3,225 bytes (~810 tokens) of detailed pagination guidance: retry loops, common perPage values, error message examples, etc. This guidance targets workflows that fetch large result sets (75K-token PR diffs, full code searches). The smoke test only calls list_pull_requests to get 2 recent merged PRs — no pagination risk.
Option A (recommended): Remove the import entirely. Add a single inline note to the prompt:
Change in .github/workflows/smoke-claude.md:
-imports:
- - shared/mcp-pagination.mdAnd in the prompt body:
+> Use `perPage: 2` when listing PRs.
+
## Test RequirementsOption B (conservative): Keep the import but accept the overhead.
Estimated savings (Option A): ~810 tokens/run (~1.8%)
Cost savings: ~$0.003/run × 630 runs/month ≈ $1.91/month
4. Reduce max-turns: 15 to 8
Actual runs consistently use 6 turns. A ceiling of 8 gives a 33% safety buffer while preventing cost runaway if the agent loops unexpectedly.
At 15 turns instead of 8, a runaway session wastes 7 extra sonnet turns × ~39K cache reads × $0.30/M = $0.082 extra per runaway run.
Change in .github/workflows/smoke-claude.md:
engine:
id: claude
- max-turns: 15
+ max-turns: 8No baseline savings, but meaningful cost-runaway protection.
Cache Analysis (Anthropic-Specific)
Aggregated across 5 representative runs (31 total requests):
| Turn (typical) | Model | Input | Output | Cache Read | Cache Write | Net New |
|---|---|---|---|---|---|---|
| 1 | Haiku | ~400 | ~50 | 0 | 0 | ~450 |
| 2 | Sonnet | ~6 | ~1,300 | 0 | ~46,400 | ~47,700 |
| 3 | Haiku | ~400 | ~50 | 0 | 0 | ~450 |
| 4 | Sonnet | ~6 | ~1,300 | ~46,400 | ~small | ~1,306 |
| 5 | Sonnet | ~6 | ~2,000 | ~46,400 | ~small | ~2,006 |
| 6 | Sonnet | ~6 | ~935 | ~46,400 | ~small | ~941 |
(Turns estimated from report aggregates: 21 Sonnet, 10 Haiku across 5 runs; direct Sonnet input was only 31 tokens total — essentially all context comes from cache reads)
Cache write amortization: Turn 2 writes the full system prompt + tool schemas (~46K tokens). Turns 4–6 each read the same ~46K block (3× reuse per session). This is healthy reuse — write cost is justified.
Cache write cost vs benefit:
- Write: 46,400 tokens × $3.75/M = $0.174/run
- Reads: 154,472 tokens × $0.30/M = $0.046/run
- Without caching, Turns 4–6 would each pay full Sonnet input price ($3/M) for ~46K tokens → $0.416 extra. Caching saves ~$0.37/run, so the $0.174 write cost is well justified.
Haiku zero-cache observation: 10 Haiku calls across 5 runs all have cache_read_tokens = 0. These are small framework routing calls (~400 tokens input, 532ms avg). Haiku is too cheap and the calls too small for caching to be worthwhile here — no action needed.
Expected Impact
| Metric | Current | Projected | Savings |
|---|---|---|---|
| Cache write tokens/run | ~46,400 | ~42,490 | ~3,910 (−8.4%) |
| Cache read tokens/run | ~154,472 | ~141,600 | ~12,872 (−8.3%) |
| Cost/run | ~$0.240 | ~$0.221 | ~$0.019 (−7.9%) |
| Monthly cost (630 runs) | ~$151 | ~$139 | ~$12/month |
| LLM turns | 6 | 6 | 0 |
| Max runaway turns | 15 | 8 | −7 |
Projections assume proportional reduction in cache write and read tokens from smaller system prompt.
Implementation Checklist
- Remove
edit:fromtools:section - Remove
cache-memory: truefromtools:section - Remove
imports: - shared/mcp-pagination.mdand add inline> Use perPage: 2 when listing PRs. - Change
max-turns: 15tomax-turns: 8 - Recompile:
gh aw compile .github/workflows/smoke-claude.md - Post-process:
npx tsx scripts/ci/postprocess-smoke-workflows.ts - Open a PR and verify CI passes on the new lock file
- After 3+ runs, compare
token-usage.jsonlfromagent-artifactsto confirm cache write reduction - Update this issue with observed vs projected savings
Generated by Daily Claude Token Optimization Advisor · ◷