feat: support token usage diff and multiple comparison runs in audit diff command#24544
feat: support token usage diff and multiple comparison runs in audit diff command#24544
Conversation
…diff command Agent-Logs-Url: https://github.com/github/gh-aw/sessions/c2cbe169-434c-4f94-8c73-7af8797b7c5d Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
@copilot use console pkg helpers |
… labels in audit diff pretty render Agent-Logs-Url: https://github.com/github/gh-aw/sessions/2e45f51d-dc34-4f35-a127-3c42a1b0ea44 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Updated in dceb448. All pretty section renderers now use |
|
@copilot collect all diff data in tables and use console table rendering helpers |
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/3ec821db-7712-419a-ba3e-3bbeda023d5e Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 453a36b. All diff sections in the pretty console renderer now use
|
There was a problem hiding this comment.
Pull request overview
Enhances the audit diff command to provide a richer token-usage breakdown (from firewall proxy token-usage.jsonl summaries) and to support diffing multiple comparison runs against a single base run in one invocation.
Changes:
- Added a detailed
TokenUsageDiffand integrated it intoRunMetricsDiffcomputation. - Updated CLI command parsing/execution to accept
base-run-id+ 1..N comparison run IDs and reuse the base summary across comparisons. - Updated pretty/markdown/JSON renderers to support rendering multiple diffs and to display token-usage detail sections; added tests covering the new diff logic.
Show a summary per file
| File | Description |
|---|---|
| pkg/cli/audit_diff.go | Adds TokenUsageDiff and computes detailed token usage diffs as part of run metrics. |
| pkg/cli/audit_diff_command.go | Changes command signature and loads the base run once for multiple comparisons. |
| pkg/cli/audit_diff_render.go | Updates JSON/markdown/pretty output to handle multiple diffs and render token usage details. |
| pkg/cli/audit_diff_test.go | Adds unit tests for token usage diffing and for metrics diff integration. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 4/4 changed files
- Comments generated: 2
pkg/cli/audit_diff.go
Outdated
| EffectiveTokensChange string `json:"effective_tokens_change,omitempty"` | ||
| Run1TotalRequests int `json:"run1_total_requests"` | ||
| Run2TotalRequests int `json:"run2_total_requests"` | ||
| RequestsChange string `json:"requests_change,omitempty"` |
There was a problem hiding this comment.
RequestsChange is formatted via formatCountChange (absolute delta like "+4") while the other "Change" fields in TokenUsageDiff are percentage strings (e.g. "+50%"). This mixes units in the same diff/table output and can be misleading; consider either formatting requests as a percentage (consistent with token fields) or renaming the field/output to make it explicit that it's an absolute request delta.
| RequestsChange string `json:"requests_change,omitempty"` | |
| RequestsDelta string `json:"requests_delta,omitempty"` // Absolute request-count delta, e.g. "+4" |
| Run1CacheEfficiency float64 `json:"run1_cache_efficiency"` | ||
| Run2CacheEfficiency float64 `json:"run2_cache_efficiency"` | ||
| } |
There was a problem hiding this comment.
TokenUsageDiff includes per-run cache efficiency values, but there is no corresponding change field, so renderers currently leave the "Change" column blank for that row. If cache efficiency is intended to be part of the diff (per PR description), consider adding a CacheEfficiencyChange (percentage-point delta or relative %) or rendering it in a way that doesn't imply a missing value.
|
@copilot review comments |
…ange Agent-Logs-Url: https://github.com/github/gh-aw/sessions/2a4a1640-5be5-4d7d-9dbe-61f73600db6b Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Both review issues addressed in 3f36c01:
|
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/edc57949-d44b-4550-b62b-31a85eab8f5c Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Fixed in 87402be. |
- Update gh aw audit diff signature to reflect multi-run support: accepts one base run and one or more comparison runs (from #24544) - Document token usage diff breakdown (input/output/cache tokens, API requests, cache efficiency) added in #24544 - Document JSON array output and divider behavior for multiple diffs - Add lock file header section to workflow-structure.md documenting the gh-aw-metadata first-line placement and the Secrets/Custom actions manifest sections added in #24495 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Enhances the
audit diffcommand with two improvements:1. Detailed token usage diff
Adds a
TokenUsageDiffstruct populated fromRunSummary.TokenUsage(the firewall proxytoken-usage.jsonldata) with a per-field breakdown:This is surfaced in both markdown (table) and pretty console output. The existing
Run.TokenUsage(int) field inRunMetricsDiffis preserved for backward compatibility.2. Multiple comparison runs from a single base
Changes the command signature from
diff <run-id-1> <run-id-2>todiff <base-run-id> <compare-run-id-1> [<compare-run-id-2>...]:Changes
pkg/cli/audit_diff.go— NewTokenUsageDiffstruct + field onRunMetricsDiff;computeTokenUsageDiff; updatedcomputeRunMetricsDiffpkg/cli/audit_diff_command.go—MinimumNArgs(2), multi-run parsing with validation, base-run loaded oncepkg/cli/audit_diff_render.go— Multi-diff render functions, token usage detail sections in markdown/prettypkg/cli/audit_diff_test.go— Tests forcomputeTokenUsageDiff,RunMetricsDiffwith token details, and multi-run diffExamples