diff --git a/docs/src/content/docs/reference/audit.md b/docs/src/content/docs/reference/audit.md new file mode 100644 index 00000000000..06f36d47b59 --- /dev/null +++ b/docs/src/content/docs/reference/audit.md @@ -0,0 +1,126 @@ +--- +title: Audit Commands +description: Reference for the gh aw audit commands — single-run analysis, behavioral diff, and cross-run security reports. +sidebar: + order: 297 +--- + +The `gh aw audit` commands download workflow run artifacts and logs, analyze MCP tool usage and network behavior, and produce structured reports suited for security reviews, debugging, and feeding to AI agents. + +## `gh aw audit ` + +Audit a single workflow run and generate a detailed Markdown report. + +**Arguments:** + +| Argument | Description | +|----------|-------------| +| `` | A numeric run ID, GitHub Actions run URL, job URL, or job URL with step anchor | + +**Accepted input formats:** + +- Numeric run ID: `1234567890` +- Run URL: `https://github.com/owner/repo/actions/runs/1234567890` +- Job URL: `https://github.com/owner/repo/actions/runs/1234567890/job/9876543210` +- Job URL with step: `https://github.com/owner/repo/actions/runs/1234567890/job/9876543210#step:7:1` +- Short run URL: `https://github.com/owner/repo/runs/1234567890` +- GitHub Enterprise URLs using the same formats above + +When a job URL is provided without a step anchor, the command extracts the output of the first failing step. When a step anchor is included, it extracts that specific step. + +**Flags:** + +| Flag | Default | Description | +|------|---------|-------------| +| `-o, --output ` | `./logs` | Directory to write downloaded artifacts and report files | +| `--json` | off | Output report as JSON to stdout | +| `--parse` | off | Run JavaScript parsers on agent and firewall logs, writing `log.md` and `firewall.md` | +| `--repo ` | auto | Specify repository when the run ID is not from a URL | +| `--verbose` | off | Print detailed progress information | + +**Examples:** + +```bash +gh aw audit 1234567890 +gh aw audit https://github.com/owner/repo/actions/runs/1234567890 +gh aw audit 1234567890 --parse +gh aw audit 1234567890 --json +gh aw audit 1234567890 -o ./audit-reports +gh aw audit 1234567890 --repo owner/repo +``` + +**Report sections** (rendered in Markdown or JSON): Overview, Comparison, Task/Domain, Behavior Fingerprint, Agentic Assessments, Metrics, Key Findings, Recommendations, Observability Insights, Performance Metrics, Engine Config, Prompt Analysis, Session Analysis, Safe Output Summary, MCP Server Health, Jobs, Downloaded Files, Missing Tools, Missing Data, Noops, MCP Failures, Firewall Analysis, Policy Analysis, Redacted Domains, Errors, Warnings, Tool Usage, MCP Tool Usage, Created Items. + +## `gh aw audit diff ` + +Compare behavior between two workflow runs. Detects policy regressions, new unauthorized domains, behavioral drift, and changes in MCP tool usage or run metrics. + +**Arguments:** + +| Argument | Description | +|----------|-------------| +| `` | Numeric run ID for the baseline run | +| `` | Numeric run ID for the comparison run | + +**Flags:** + +| Flag | Default | Description | +|------|---------|-------------| +| `--format ` | `pretty` | Output format: `pretty` or `markdown` | +| `--json` | off | Output diff as JSON | +| `--repo ` | auto | Specify repository | +| `-o, --output ` | `./logs` | Directory for downloaded artifacts | +| `--verbose` | off | Print detailed progress | + +The diff output includes: +- New and removed network domains +- Domain status changes (allowed ↔ denied) +- Volume changes (request count changes above a 100% threshold) +- Anomaly flags (new denied domains, previously-denied domains now allowed) +- MCP tool invocation changes (new/removed tools, call count and error count diffs) +- Run metrics comparison (token usage, duration, turns) + +**Examples:** + +```bash +gh aw audit diff 12345 12346 +gh aw audit diff 12345 12346 --format markdown +gh aw audit diff 12345 12346 --json +gh aw audit diff 12345 12346 --repo owner/repo +``` + +## `gh aw audit report` + +Generate a cross-run security and performance audit report across multiple recent workflow runs. + +**Flags:** + +| Flag | Default | Description | +|------|---------|-------------| +| `-w, --workflow ` | all | Filter by workflow name or filename | +| `--last ` | 20 | Number of recent runs to analyze (max 50) | +| `--format ` | `markdown` | Output format: `markdown` or `pretty` | +| `--json` | off | Output report as JSON | +| `--repo ` | auto | Specify repository | +| `-o, --output ` | `./logs` | Directory for downloaded artifacts | +| `--verbose` | off | Print detailed progress | + +The report output includes an executive summary, domain inventory, metrics trends, MCP server health, and per-run breakdown. It detects cross-run anomalies such as domain access spikes, elevated MCP error rates, and connection rate changes. + +**Examples:** + +```bash +gh aw audit report +gh aw audit report --workflow "daily-repo-status" --last 10 +gh aw audit report --workflow "agent-task" --last 5 --json +gh aw audit report --format pretty +gh aw audit report --repo owner/repo --last 10 +``` + +## Related Documentation + +- [Cost Management](/gh-aw/reference/cost-management/) — Track token usage and inference spend +- [Effective Tokens Specification](/gh-aw/reference/effective-tokens-specification/) — How effective tokens are computed +- [Network](/gh-aw/reference/network/) — Firewall and domain allow/deny configuration +- [MCP Gateway](/gh-aw/reference/mcp-gateway/) — MCP server health and debugging +- [CLI Commands](/gh-aw/setup/cli/) — Full CLI reference diff --git a/docs/src/content/docs/reference/cost-management.md b/docs/src/content/docs/reference/cost-management.md index 54c9b7be646..49d0426320e 100644 --- a/docs/src/content/docs/reference/cost-management.md +++ b/docs/src/content/docs/reference/cost-management.md @@ -218,10 +218,12 @@ These are rough estimates to help with budgeting. Actual costs vary by prompt si | On-demand via slash command | User-controlled | Varies | Varies | > [!TIP] -> Use `gh aw audit ` to deep-dive into token usage and cost for a single run. Create separate `COPILOT_GITHUB_TOKEN` service accounts per repository or team to attribute spend by workflow. +> Use `gh aw audit ` to deep-dive into token usage and cost for a single run. Use `gh aw audit report --workflow ` to analyze cost trends across multiple runs. Create separate `COPILOT_GITHUB_TOKEN` service accounts per repository or team to attribute spend by workflow. ## Related Documentation +- [Audit Commands](/gh-aw/reference/audit/) - Single-run analysis, diff, and cross-run reporting +- [Effective Tokens Specification](/gh-aw/reference/effective-tokens-specification/) - How effective token counts are computed - [Triggers](/gh-aw/reference/triggers/) - Configuring workflow triggers and skip conditions - [Rate Limiting Controls](/gh-aw/reference/rate-limiting-controls/) - Preventing runaway workflows - [Concurrency](/gh-aw/reference/concurrency/) - Serializing workflow execution diff --git a/docs/src/content/docs/reference/engines.md b/docs/src/content/docs/reference/engines.md index 2084de4b458..82f840470fa 100644 --- a/docs/src/content/docs/reference/engines.md +++ b/docs/src/content/docs/reference/engines.md @@ -252,6 +252,26 @@ engine: args: ["--verbose"] ``` +### Custom Token Weights (`token-weights`) + +Override the built-in token cost multipliers used when computing [Effective Tokens](/gh-aw/reference/effective-tokens-specification/). Useful when your workflow uses a custom model not in the built-in list, or when you want to adjust the relative cost ratios for your use case. + +```yaml wrap +engine: + id: claude + token-weights: + multipliers: + my-custom-model: 2.5 # 2.5x the cost of claude-sonnet-4.5 + experimental-llm: 0.8 # Override an existing model's multiplier + token-class-weights: + output: 6.0 # Override output token weight (default: 4.0) + cached-input: 0.05 # Override cached input weight (default: 0.1) +``` + +`multipliers` is a map of model names to numeric multipliers relative to `claude-sonnet-4.5` (= 1.0). Keys are case-insensitive and support prefix matching. `token-class-weights` overrides the per-class weights applied before the model multiplier; the defaults are `input: 1.0`, `cached-input: 0.1`, `output: 4.0`, `reasoning: 4.0`, `cache-write: 1.0`. + +Custom weights are embedded in the compiled workflow YAML and read by `gh aw logs` and `gh aw audit` when analyzing runs. + ## Related Documentation - [Frontmatter](/gh-aw/reference/frontmatter/) - Complete configuration reference diff --git a/docs/src/content/docs/reference/frontmatter-full.md b/docs/src/content/docs/reference/frontmatter-full.md index 47bd5011d6c..aa4c5451a2d 100644 --- a/docs/src/content/docs/reference/frontmatter-full.md +++ b/docs/src/content/docs/reference/frontmatter-full.md @@ -1631,7 +1631,7 @@ sandbox: # Array of Mount specification in format 'source:destination:mode' # Memory limit for the AWF container (e.g., '4g', '8g'). Passed as --memory-limit - # to AWF. If not specified, AWF's default memory limit is used. + # to AWF. If not specified, AWF's default memory limit of 6g is used. # (optional) memory: "example-value" @@ -1873,6 +1873,29 @@ engine: args: [] # Array of strings + # Custom model token weights for effective token computation. Overrides or + # extends the built-in model multipliers from model_multipliers.json. Useful + # for custom models or adjusted cost ratios. + # (optional) + token-weights: + # Per-model cost multipliers relative to the reference model + # (claude-sonnet-4.5 = 1.0). Keys are model names (case-insensitive, + # prefix matching supported). + # (optional) + multipliers: + my-custom-model: 2.5 + + # Per-token-class weights applied before the model multiplier. Defaults: + # input: 1.0, cached-input: 0.1, output: 4.0, reasoning: 4.0, + # cache-write: 1.0 + # (optional) + token-class-weights: + input: 1.0 + cached-input: 0.1 + output: 4.0 + reasoning: 4.0 + cache-write: 1.0 + # Option 3: Inline engine definition: specifies a runtime adapter and optional # provider settings directly in the workflow frontmatter, without requiring a # named catalog entry diff --git a/docs/src/content/docs/setup/quick-start.mdx b/docs/src/content/docs/setup/quick-start.mdx index e0dcd12e602..08824e86f6c 100644 --- a/docs/src/content/docs/setup/quick-start.mdx +++ b/docs/src/content/docs/setup/quick-start.mdx @@ -65,7 +65,7 @@ This will take you through an interactive process to: 1. **Check prerequisites** - Verify repository permissions. 2. **Select an AI Engine** - Choose between Copilot, Claude, or Codex. -3. **Set up the required secret** - [`COPILOT_GITHUB_TOKEN`](/gh-aw/reference/auth/#copilot_github_token), [`ANTHROPIC_API_KEY`](/gh-aw/reference/auth/#anthropic_api_key) or [`OPENAI_API_KEY`](/gh-aw/reference/auth/#openai_api_key). +3. **Set up the required secret** - [`COPILOT_GITHUB_TOKEN`](/gh-aw/reference/auth/#copilot_github_token) (a separate GitHub token with Copilot access — distinct from the default `GITHUB_TOKEN`), [`ANTHROPIC_API_KEY`](/gh-aw/reference/auth/#anthropic_api_key), or [`OPENAI_API_KEY`](/gh-aw/reference/auth/#openai_api_key). See [Authentication](/gh-aw/reference/auth/) for setup instructions. 4. **Add the workflow** - Adds the workflow and lock file to `.github/workflows/`. 5. **Optionally trigger an initial run** - Starts the workflow immediately. @@ -100,7 +100,7 @@ To customize it now: 2. Edit the section "What to include" to list things you are having trouble with regularly in your repository: your issue blacklog, your CI setup, your testing, the performance of your software, your roadmap. Any or all of these, or anything else you want to improve. You can also customize the style and process sections to guide the coding agent's behavior. -3. If you have changed the frontmatter, regenerate the workflow YAML from the frontmatter of your workflow by running: +3. If you have changed the [frontmatter](/gh-aw/reference/frontmatter/) (the YAML configuration block between `---` markers at the top of the file), regenerate the workflow YAML by running: ```text gh aw compile diff --git a/docs/src/content/docs/troubleshooting/debugging.md b/docs/src/content/docs/troubleshooting/debugging.md index e6cebe8542e..5401e569f05 100644 --- a/docs/src/content/docs/troubleshooting/debugging.md +++ b/docs/src/content/docs/troubleshooting/debugging.md @@ -103,6 +103,21 @@ Audit output includes: - **Firewall analysis** — blocked domains and allowed traffic - **Safe-outputs** — structured outputs the agent produced +To compare behavior between two runs and detect regressions, use `audit diff`: + +```bash +gh aw audit diff 12345678 12345679 +gh aw audit diff 12345678 12345679 --format markdown +``` + +For trends across multiple runs, use `audit report`: + +```bash +gh aw audit report --workflow "my-workflow" --last 10 +``` + +See [Audit Commands](/gh-aw/reference/audit/) for complete flag documentation. + ### Analyzing Workflow Logs `gh aw logs` downloads and analyzes logs across multiple runs with tool usage, network patterns, errors, and warnings: