⚡ Claude Token Optimization2026-04-02 — smoke-claude

## Target Workflow: `smoke-claude`

**Source report:** #1606
**Estimated cost per run:** ~$0.24 (range: $0.18–$0.30)
**Total tokens per run:** ~203K
**Cache read rate:** 76.2% (excellent)
**Cache write rate:** 22.9% ← dominant cost at 73% of spend
**LLM turns:** ~6 (4 × Sonnet + 2 × Haiku routing)
**Run frequency:** 21 runs over the last 24 hours (every PR + schedule)

---

## Current Configuration

| Setting | Value |
|---------|-------|
| Tools loaded | 5 (`cache-memory`, `github`, `playwright`, `edit`, `bash`) |
| Tools actually used | 4 (`github`, `playwright`, `bash`, `safe-outputs`) |
| Network groups | `defaults`, `github`, `playwright` |
| Shared imports | `shared/mcp-pagination.md` (3,225 bytes, ~810 tokens) |
| Pre-agent steps | No |
| Prompt size | smoke-claude.md: 3,531 bytes + mcp-pagination.md: 3,225 bytes |
| `max-turns` | 15 (actual runs use ~6) |

**`edit:` is loaded but never called** — the file-creation test uses `bash echo`, not the edit tool.

**`cache-memory:` is loaded but not needed** — cache memory is designed for multi-session persistent memory in long-running agents. A 6-turn smoke test does not benefit from it.

---

## Recommendations

### 1. Remove `edit:` tool — not used by the smoke test

The smoke test creates a file using `bash echo`. The `edit:` tool is never called.

**Estimated savings:** ~600 tokens/run from cache writes (~1.3%)

**Change in `.github/workflows/smoke-claude.md`:**
```diff
 tools:
   cache-memory: true
   github:
     toolsets: [repos, pull_requests]
   playwright:
     allowed_domains:
       - github.com
-  edit:
   bash:
     - "*"
```

**Cost savings:** ~$0.00225/run × 630 runs/month ≈ **$1.42/month**

---

### 2. Remove `cache-memory: true` — irrelevant for a short smoke test

Cache memory persists learned facts across agent sessions. A 6-turn smoke test that runs then exits has no use for cross-session memory. Removing it eliminates:
- The `cache_memory_prompt.md` framework injection from the system prompt (~2,000–3,000 tokens)
- The cache-memory tool schema (~600 tokens)

**Estimated savings:** ~2,500 tokens/run from cache writes (~5.4%)

**Change in `.github/workflows/smoke-claude.md`:**
```diff
 tools:
-  cache-memory: true
   github:
     toolsets: [repos, pull_requests]
```

**Cost savings:** ~$0.009/run × 630 runs/month ≈ **$5.91/month**

---

### 3. Remove `imports: shared/mcp-pagination.md` (or inline a one-liner)

`mcp-pagination.md` is 3,225 bytes (~810 tokens) of detailed pagination guidance: retry loops, common perPage values, error message examples, etc. This guidance targets workflows that fetch large result sets (75K-token PR diffs, full code searches). The smoke test only calls `list_pull_requests` to get 2 recent merged PRs — no pagination risk.

**Option A (recommended):** Remove the import entirely. Add a single inline note to the prompt:

**Change in `.github/workflows/smoke-claude.md`:**
```diff
-imports:
-  - shared/mcp-pagination.md
```

And in the prompt body:
```diff
+> Use `perPage: 2` when listing PRs.
+
 ## Test Requirements
```

**Option B (conservative):** Keep the import but accept the overhead.

**Estimated savings (Option A):** ~810 tokens/run (~1.8%)

**Cost savings:** ~$0.003/run × 630 runs/month ≈ **$1.91/month**

---

### 4. Reduce `max-turns: 15` to `8`

Actual runs consistently use 6 turns. A ceiling of 8 gives a 33% safety buffer while preventing cost runaway if the agent loops unexpectedly.

At 15 turns instead of 8, a runaway session wastes 7 extra sonnet turns × ~39K cache reads × $0.30/M = **$0.082 extra per runaway run**.

**Change in `.github/workflows/smoke-claude.md`:**
```diff
 engine:
   id: claude
-  max-turns: 15
+  max-turns: 8
```

No baseline savings, but meaningful cost-runaway protection.

---

## Cache Analysis (Anthropic-Specific)

Aggregated across 5 representative runs (31 total requests):

| Turn (typical) | Model | Input | Output | Cache Read | Cache Write | Net New |
|---|---|---:|---:|---:|---:|---:|
| 1 | Haiku | ~400 | ~50 | 0 | 0 | ~450 |
| 2 | Sonnet | ~6 | ~1,300 | 0 | ~46,400 | ~47,700 |
| 3 | Haiku | ~400 | ~50 | 0 | 0 | ~450 |
| 4 | Sonnet | ~6 | ~1,300 | ~46,400 | ~small | ~1,306 |
| 5 | Sonnet | ~6 | ~2,000 | ~46,400 | ~small | ~2,006 |
| 6 | Sonnet | ~6 | ~935 | ~46,400 | ~small | ~941 |

*(Turns estimated from report aggregates: 21 Sonnet, 10 Haiku across 5 runs; direct Sonnet input was only 31 tokens total — essentially all context comes from cache reads)*

**Cache write amortization:** Turn 2 writes the full system prompt + tool schemas (~46K tokens). Turns 4–6 each read the same ~46K block (3× reuse per session). This is healthy reuse — write cost is justified.

**Cache write cost vs benefit:**
- Write: 46,400 tokens × $3.75/M = **$0.174/run**
- Reads: 154,472 tokens × $0.30/M = **$0.046/run**
- Without caching, Turns 4–6 would each pay full Sonnet input price ($3/M) for ~46K tokens → **$0.416** extra. Caching saves ~$0.37/run, so the $0.174 write cost is well justified.

**Haiku zero-cache observation:** 10 Haiku calls across 5 runs all have `cache_read_tokens = 0`. These are small framework routing calls (~400 tokens input, 532ms avg). Haiku is too cheap and the calls too small for caching to be worthwhile here — no action needed.

---

## Expected Impact

| Metric | Current | Projected | Savings |
|--------|---------|-----------|---------|
| Cache write tokens/run | ~46,400 | ~42,490 | ~3,910 (−8.4%) |
| Cache read tokens/run | ~154,472 | ~141,600 | ~12,872 (−8.3%) |
| Cost/run | ~$0.240 | ~$0.221 | ~$0.019 (−7.9%) |
| Monthly cost (630 runs) | ~$151 | ~$139 | **~$12/month** |
| LLM turns | 6 | 6 | 0 |
| Max runaway turns | 15 | 8 | −7 |

*Projections assume proportional reduction in cache write and read tokens from smaller system prompt.*

---

## Implementation Checklist

- [ ] Remove `edit:` from `tools:` section
- [ ] Remove `cache-memory: true` from `tools:` section
- [ ] Remove `imports: - shared/mcp-pagination.md` and add inline `> Use perPage: 2 when listing PRs.`
- [ ] Change `max-turns: 15` to `max-turns: 8`
- [ ] Recompile: `gh aw compile .github/workflows/smoke-claude.md`
- [ ] Post-process: `npx tsx scripts/ci/postprocess-smoke-workflows.ts`
- [ ] Open a PR and verify CI passes on the new lock file
- [ ] After 3+ runs, compare `token-usage.jsonl` from `agent-artifacts` to confirm cache write reduction
- [ ] Update this issue with observed vs projected savings




> Generated by [Daily Claude Token Optimization Advisor](https://github.com/github/gh-aw-firewall/actions/runs/23919449375/agentic_workflow) · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw-firewall+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw-firewall%2Fclaude-token-optimizer%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ Claude Token Optimization2026-04-02 — smoke-claude #1627

Target Workflow: `smoke-claude`

Current Configuration

Recommendations

1. Remove `edit:` tool — not used by the smoke test

2. Remove `cache-memory: true` — irrelevant for a short smoke test

3. Remove `imports: shared/mcp-pagination.md` (or inline a one-liner)

4. Reduce `max-turns: 15` to `8`

Cache Analysis (Anthropic-Specific)

Expected Impact

Implementation Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Setting	Value
Tools loaded	5 (`cache-memory`, `github`, `playwright`, `edit`, `bash`)
Tools actually used	4 (`github`, `playwright`, `bash`, `safe-outputs`)
Network groups	`defaults`, `github`, `playwright`
Shared imports	`shared/mcp-pagination.md` (3,225 bytes, ~810 tokens)
Pre-agent steps	No
Prompt size	smoke-claude.md: 3,531 bytes + mcp-pagination.md: 3,225 bytes
`max-turns`	15 (actual runs use ~6)

Turn (typical)	Model	Input	Output	Cache Read	Cache Write	Net New
1	Haiku	~400	~50	0	0	~450
2	Sonnet	~6	~1,300	0	~46,400	~47,700
3	Haiku	~400	~50	0	0	~450
4	Sonnet	~6	~1,300	~46,400	~small	~1,306
5	Sonnet	~6	~2,000	~46,400	~small	~2,006
6	Sonnet	~6	~935	~46,400	~small	~941

Metric	Current	Projected	Savings
Cache write tokens/run	~46,400	~42,490	~3,910 (−8.4%)
Cache read tokens/run	~154,472	~141,600	~12,872 (−8.3%)
Cost/run	~$0.240	~$0.221	~$0.019 (−7.9%)
Monthly cost (630 runs)	~$151	~$139	~$12/month
LLM turns	6	6	0
Max runaway turns	15	8	−7

⚡ Claude Token Optimization2026-04-02 — smoke-claude #1627

Description

Target Workflow: smoke-claude

Current Configuration

Recommendations

1. Remove edit: tool — not used by the smoke test

2. Remove cache-memory: true — irrelevant for a short smoke test

3. Remove imports: shared/mcp-pagination.md (or inline a one-liner)

4. Reduce max-turns: 15 to 8

Cache Analysis (Anthropic-Specific)

Expected Impact

Implementation Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Target Workflow: `smoke-claude`

1. Remove `edit:` tool — not used by the smoke test

2. Remove `cache-memory: true` — irrelevant for a short smoke test

3. Remove `imports: shared/mcp-pagination.md` (or inline a one-liner)

4. Reduce `max-turns: 15` to `8`