Skip to content

⚡ Claude Token Optimization2026-04-02 — smoke-claude #1627

@github-actions

Description

@github-actions

Target Workflow: smoke-claude

Source report: #1606
Estimated cost per run: ~$0.24 (range: $0.18–$0.30)
Total tokens per run: ~203K
Cache read rate: 76.2% (excellent)
Cache write rate: 22.9% ← dominant cost at 73% of spend
LLM turns: ~6 (4 × Sonnet + 2 × Haiku routing)
Run frequency: 21 runs over the last 24 hours (every PR + schedule)


Current Configuration

Setting Value
Tools loaded 5 (cache-memory, github, playwright, edit, bash)
Tools actually used 4 (github, playwright, bash, safe-outputs)
Network groups defaults, github, playwright
Shared imports shared/mcp-pagination.md (3,225 bytes, ~810 tokens)
Pre-agent steps No
Prompt size smoke-claude.md: 3,531 bytes + mcp-pagination.md: 3,225 bytes
max-turns 15 (actual runs use ~6)

edit: is loaded but never called — the file-creation test uses bash echo, not the edit tool.

cache-memory: is loaded but not needed — cache memory is designed for multi-session persistent memory in long-running agents. A 6-turn smoke test does not benefit from it.


Recommendations

1. Remove edit: tool — not used by the smoke test

The smoke test creates a file using bash echo. The edit: tool is never called.

Estimated savings: ~600 tokens/run from cache writes (~1.3%)

Change in .github/workflows/smoke-claude.md:

 tools:
   cache-memory: true
   github:
     toolsets: [repos, pull_requests]
   playwright:
     allowed_domains:
       - github.com
-  edit:
   bash:
     - "*"

Cost savings: ~$0.00225/run × 630 runs/month ≈ $1.42/month


2. Remove cache-memory: true — irrelevant for a short smoke test

Cache memory persists learned facts across agent sessions. A 6-turn smoke test that runs then exits has no use for cross-session memory. Removing it eliminates:

  • The cache_memory_prompt.md framework injection from the system prompt (~2,000–3,000 tokens)
  • The cache-memory tool schema (~600 tokens)

Estimated savings: ~2,500 tokens/run from cache writes (~5.4%)

Change in .github/workflows/smoke-claude.md:

 tools:
-  cache-memory: true
   github:
     toolsets: [repos, pull_requests]

Cost savings: ~$0.009/run × 630 runs/month ≈ $5.91/month


3. Remove imports: shared/mcp-pagination.md (or inline a one-liner)

mcp-pagination.md is 3,225 bytes (~810 tokens) of detailed pagination guidance: retry loops, common perPage values, error message examples, etc. This guidance targets workflows that fetch large result sets (75K-token PR diffs, full code searches). The smoke test only calls list_pull_requests to get 2 recent merged PRs — no pagination risk.

Option A (recommended): Remove the import entirely. Add a single inline note to the prompt:

Change in .github/workflows/smoke-claude.md:

-imports:
-  - shared/mcp-pagination.md

And in the prompt body:

+> Use `perPage: 2` when listing PRs.
+
 ## Test Requirements

Option B (conservative): Keep the import but accept the overhead.

Estimated savings (Option A): ~810 tokens/run (~1.8%)

Cost savings: ~$0.003/run × 630 runs/month ≈ $1.91/month


4. Reduce max-turns: 15 to 8

Actual runs consistently use 6 turns. A ceiling of 8 gives a 33% safety buffer while preventing cost runaway if the agent loops unexpectedly.

At 15 turns instead of 8, a runaway session wastes 7 extra sonnet turns × ~39K cache reads × $0.30/M = $0.082 extra per runaway run.

Change in .github/workflows/smoke-claude.md:

 engine:
   id: claude
-  max-turns: 15
+  max-turns: 8

No baseline savings, but meaningful cost-runaway protection.


Cache Analysis (Anthropic-Specific)

Aggregated across 5 representative runs (31 total requests):

Turn (typical) Model Input Output Cache Read Cache Write Net New
1 Haiku ~400 ~50 0 0 ~450
2 Sonnet ~6 ~1,300 0 ~46,400 ~47,700
3 Haiku ~400 ~50 0 0 ~450
4 Sonnet ~6 ~1,300 ~46,400 ~small ~1,306
5 Sonnet ~6 ~2,000 ~46,400 ~small ~2,006
6 Sonnet ~6 ~935 ~46,400 ~small ~941

(Turns estimated from report aggregates: 21 Sonnet, 10 Haiku across 5 runs; direct Sonnet input was only 31 tokens total — essentially all context comes from cache reads)

Cache write amortization: Turn 2 writes the full system prompt + tool schemas (~46K tokens). Turns 4–6 each read the same ~46K block (3× reuse per session). This is healthy reuse — write cost is justified.

Cache write cost vs benefit:

  • Write: 46,400 tokens × $3.75/M = $0.174/run
  • Reads: 154,472 tokens × $0.30/M = $0.046/run
  • Without caching, Turns 4–6 would each pay full Sonnet input price ($3/M) for ~46K tokens → $0.416 extra. Caching saves ~$0.37/run, so the $0.174 write cost is well justified.

Haiku zero-cache observation: 10 Haiku calls across 5 runs all have cache_read_tokens = 0. These are small framework routing calls (~400 tokens input, 532ms avg). Haiku is too cheap and the calls too small for caching to be worthwhile here — no action needed.


Expected Impact

Metric Current Projected Savings
Cache write tokens/run ~46,400 ~42,490 ~3,910 (−8.4%)
Cache read tokens/run ~154,472 ~141,600 ~12,872 (−8.3%)
Cost/run ~$0.240 ~$0.221 ~$0.019 (−7.9%)
Monthly cost (630 runs) ~$151 ~$139 ~$12/month
LLM turns 6 6 0
Max runaway turns 15 8 −7

Projections assume proportional reduction in cache write and read tokens from smaller system prompt.


Implementation Checklist

  • Remove edit: from tools: section
  • Remove cache-memory: true from tools: section
  • Remove imports: - shared/mcp-pagination.md and add inline > Use perPage: 2 when listing PRs.
  • Change max-turns: 15 to max-turns: 8
  • Recompile: gh aw compile .github/workflows/smoke-claude.md
  • Post-process: npx tsx scripts/ci/postprocess-smoke-workflows.ts
  • Open a PR and verify CI passes on the new lock file
  • After 3+ runs, compare token-usage.jsonl from agent-artifacts to confirm cache write reduction
  • Update this issue with observed vs projected savings

Generated by Daily Claude Token Optimization Advisor ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions