Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions docs/src/content/docs/reference/audit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
---
title: Audit Commands
description: Reference for the gh aw audit commands — single-run analysis, behavioral diff, and cross-run security reports.
sidebar:
order: 297
---

The `gh aw audit` commands download workflow run artifacts and logs, analyze MCP tool usage and network behavior, and produce structured reports suited for security reviews, debugging, and feeding to AI agents.

## `gh aw audit <run-id-or-url>`

Audit a single workflow run and generate a detailed Markdown report.

**Arguments:**

| Argument | Description |
|----------|-------------|
| `<run-id-or-url>` | A numeric run ID, GitHub Actions run URL, job URL, or job URL with step anchor |

**Accepted input formats:**

- Numeric run ID: `1234567890`
- Run URL: `https://github.com/owner/repo/actions/runs/1234567890`
- Job URL: `https://github.com/owner/repo/actions/runs/1234567890/job/9876543210`
- Job URL with step: `https://github.com/owner/repo/actions/runs/1234567890/job/9876543210#step:7:1`
- Short run URL: `https://github.com/owner/repo/runs/1234567890`
- GitHub Enterprise URLs using the same formats above

When a job URL is provided without a step anchor, the command extracts the output of the first failing step. When a step anchor is included, it extracts that specific step.

**Flags:**

| Flag | Default | Description |
|------|---------|-------------|
| `-o, --output <dir>` | `./logs` | Directory to write downloaded artifacts and report files |
| `--json` | off | Output report as JSON to stdout |
| `--parse` | off | Run JavaScript parsers on agent and firewall logs, writing `log.md` and `firewall.md` |
| `--repo <owner/repo>` | auto | Specify repository when the run ID is not from a URL |
| `--verbose` | off | Print detailed progress information |

**Examples:**

```bash
gh aw audit 1234567890
gh aw audit https://github.com/owner/repo/actions/runs/1234567890
gh aw audit 1234567890 --parse
gh aw audit 1234567890 --json
gh aw audit 1234567890 -o ./audit-reports
gh aw audit 1234567890 --repo owner/repo
```

**Report sections** (rendered in Markdown or JSON): Overview, Comparison, Task/Domain, Behavior Fingerprint, Agentic Assessments, Metrics, Key Findings, Recommendations, Observability Insights, Performance Metrics, Engine Config, Prompt Analysis, Session Analysis, Safe Output Summary, MCP Server Health, Jobs, Downloaded Files, Missing Tools, Missing Data, Noops, MCP Failures, Firewall Analysis, Policy Analysis, Redacted Domains, Errors, Warnings, Tool Usage, MCP Tool Usage, Created Items.

## `gh aw audit diff <run-id-1> <run-id-2>`

Compare behavior between two workflow runs. Detects policy regressions, new unauthorized domains, behavioral drift, and changes in MCP tool usage or run metrics.

**Arguments:**

| Argument | Description |
|----------|-------------|
| `<run-id-1>` | Numeric run ID for the baseline run |
| `<run-id-2>` | Numeric run ID for the comparison run |

**Flags:**

| Flag | Default | Description |
|------|---------|-------------|
| `--format <fmt>` | `pretty` | Output format: `pretty` or `markdown` |
| `--json` | off | Output diff as JSON |
| `--repo <owner/repo>` | auto | Specify repository |
| `-o, --output <dir>` | `./logs` | Directory for downloaded artifacts |
| `--verbose` | off | Print detailed progress |

The diff output includes:
- New and removed network domains
- Domain status changes (allowed ↔ denied)
- Volume changes (request count changes above a 100% threshold)
- Anomaly flags (new denied domains, previously-denied domains now allowed)
- MCP tool invocation changes (new/removed tools, call count and error count diffs)
- Run metrics comparison (token usage, duration, turns)

**Examples:**

```bash
gh aw audit diff 12345 12346
gh aw audit diff 12345 12346 --format markdown
gh aw audit diff 12345 12346 --json
gh aw audit diff 12345 12346 --repo owner/repo
```

## `gh aw audit report`

Generate a cross-run security and performance audit report across multiple recent workflow runs.

**Flags:**

| Flag | Default | Description |
|------|---------|-------------|
| `-w, --workflow <name>` | all | Filter by workflow name or filename |
| `--last <n>` | 20 | Number of recent runs to analyze (max 50) |
| `--format <fmt>` | `markdown` | Output format: `markdown` or `pretty` |
| `--json` | off | Output report as JSON |
| `--repo <owner/repo>` | auto | Specify repository |
| `-o, --output <dir>` | `./logs` | Directory for downloaded artifacts |
| `--verbose` | off | Print detailed progress |

The report output includes an executive summary, domain inventory, metrics trends, MCP server health, and per-run breakdown. It detects cross-run anomalies such as domain access spikes, elevated MCP error rates, and connection rate changes.

**Examples:**

```bash
gh aw audit report
gh aw audit report --workflow "daily-repo-status" --last 10
gh aw audit report --workflow "agent-task" --last 5 --json
gh aw audit report --format pretty
gh aw audit report --repo owner/repo --last 10
```

## Related Documentation

- [Cost Management](/gh-aw/reference/cost-management/) — Track token usage and inference spend
- [Effective Tokens Specification](/gh-aw/reference/effective-tokens-specification/) — How effective tokens are computed
- [Network](/gh-aw/reference/network/) — Firewall and domain allow/deny configuration
- [MCP Gateway](/gh-aw/reference/mcp-gateway/) — MCP server health and debugging
- [CLI Commands](/gh-aw/setup/cli/) — Full CLI reference
4 changes: 3 additions & 1 deletion docs/src/content/docs/reference/cost-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,10 +218,12 @@ These are rough estimates to help with budgeting. Actual costs vary by prompt si
| On-demand via slash command | User-controlled | Varies | Varies |

> [!TIP]
> Use `gh aw audit <run-id>` to deep-dive into token usage and cost for a single run. Create separate `COPILOT_GITHUB_TOKEN` service accounts per repository or team to attribute spend by workflow.
> Use `gh aw audit <run-id>` to deep-dive into token usage and cost for a single run. Use `gh aw audit report --workflow <name>` to analyze cost trends across multiple runs. Create separate `COPILOT_GITHUB_TOKEN` service accounts per repository or team to attribute spend by workflow.

## Related Documentation

- [Audit Commands](/gh-aw/reference/audit/) - Single-run analysis, diff, and cross-run reporting
- [Effective Tokens Specification](/gh-aw/reference/effective-tokens-specification/) - How effective token counts are computed
- [Triggers](/gh-aw/reference/triggers/) - Configuring workflow triggers and skip conditions
- [Rate Limiting Controls](/gh-aw/reference/rate-limiting-controls/) - Preventing runaway workflows
- [Concurrency](/gh-aw/reference/concurrency/) - Serializing workflow execution
Expand Down
20 changes: 20 additions & 0 deletions docs/src/content/docs/reference/engines.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,26 @@ engine:
args: ["--verbose"]
```

### Custom Token Weights (`token-weights`)

Override the built-in token cost multipliers used when computing [Effective Tokens](/gh-aw/reference/effective-tokens-specification/). Useful when your workflow uses a custom model not in the built-in list, or when you want to adjust the relative cost ratios for your use case.

```yaml wrap
engine:
id: claude
token-weights:
multipliers:
my-custom-model: 2.5 # 2.5x the cost of claude-sonnet-4.5
experimental-llm: 0.8 # Override an existing model's multiplier
token-class-weights:
output: 6.0 # Override output token weight (default: 4.0)
cached-input: 0.05 # Override cached input weight (default: 0.1)
```

`multipliers` is a map of model names to numeric multipliers relative to `claude-sonnet-4.5` (= 1.0). Keys are case-insensitive and support prefix matching. `token-class-weights` overrides the per-class weights applied before the model multiplier; the defaults are `input: 1.0`, `cached-input: 0.1`, `output: 4.0`, `reasoning: 4.0`, `cache-write: 1.0`.

Custom weights are embedded in the compiled workflow YAML and read by `gh aw logs` and `gh aw audit` when analyzing runs.

## Related Documentation

- [Frontmatter](/gh-aw/reference/frontmatter/) - Complete configuration reference
Expand Down
25 changes: 24 additions & 1 deletion docs/src/content/docs/reference/frontmatter-full.md
Original file line number Diff line number Diff line change
Expand Up @@ -1631,7 +1631,7 @@ sandbox:
# Array of Mount specification in format 'source:destination:mode'

# Memory limit for the AWF container (e.g., '4g', '8g'). Passed as --memory-limit
# to AWF. If not specified, AWF's default memory limit is used.
# to AWF. If not specified, AWF's default memory limit of 6g is used.
# (optional)
memory: "example-value"

Expand Down Expand Up @@ -1873,6 +1873,29 @@ engine:
args: []
# Array of strings

# Custom model token weights for effective token computation. Overrides or
# extends the built-in model multipliers from model_multipliers.json. Useful
# for custom models or adjusted cost ratios.
# (optional)
token-weights:
# Per-model cost multipliers relative to the reference model
# (claude-sonnet-4.5 = 1.0). Keys are model names (case-insensitive,
# prefix matching supported).
# (optional)
multipliers:
my-custom-model: 2.5

# Per-token-class weights applied before the model multiplier. Defaults:
# input: 1.0, cached-input: 0.1, output: 4.0, reasoning: 4.0,
# cache-write: 1.0
# (optional)
token-class-weights:
input: 1.0
cached-input: 0.1
output: 4.0
reasoning: 4.0
cache-write: 1.0

# Option 3: Inline engine definition: specifies a runtime adapter and optional
# provider settings directly in the workflow frontmatter, without requiring a
# named catalog entry
Expand Down
4 changes: 2 additions & 2 deletions docs/src/content/docs/setup/quick-start.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ This will take you through an interactive process to:

1. **Check prerequisites** - Verify repository permissions.
2. **Select an AI Engine** - Choose between Copilot, Claude, or Codex.
3. **Set up the required secret** - [`COPILOT_GITHUB_TOKEN`](/gh-aw/reference/auth/#copilot_github_token), [`ANTHROPIC_API_KEY`](/gh-aw/reference/auth/#anthropic_api_key) or [`OPENAI_API_KEY`](/gh-aw/reference/auth/#openai_api_key).
3. **Set up the required secret** - [`COPILOT_GITHUB_TOKEN`](/gh-aw/reference/auth/#copilot_github_token) (a separate GitHub token with Copilot access — distinct from the default `GITHUB_TOKEN`), [`ANTHROPIC_API_KEY`](/gh-aw/reference/auth/#anthropic_api_key), or [`OPENAI_API_KEY`](/gh-aw/reference/auth/#openai_api_key). See [Authentication](/gh-aw/reference/auth/) for setup instructions.
4. **Add the workflow** - Adds the workflow and lock file to `.github/workflows/`.
5. **Optionally trigger an initial run** - Starts the workflow immediately.

Expand Down Expand Up @@ -100,7 +100,7 @@ To customize it now:

2. Edit the section "What to include" to list things you are having trouble with regularly in your repository: your issue blacklog, your CI setup, your testing, the performance of your software, your roadmap. Any or all of these, or anything else you want to improve. You can also customize the style and process sections to guide the coding agent's behavior.

3. If you have changed the frontmatter, regenerate the workflow YAML from the frontmatter of your workflow by running:
3. If you have changed the [frontmatter](/gh-aw/reference/frontmatter/) (the YAML configuration block between `---` markers at the top of the file), regenerate the workflow YAML by running:

```text
gh aw compile
Expand Down
15 changes: 15 additions & 0 deletions docs/src/content/docs/troubleshooting/debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,21 @@ Audit output includes:
- **Firewall analysis** — blocked domains and allowed traffic
- **Safe-outputs** — structured outputs the agent produced

To compare behavior between two runs and detect regressions, use `audit diff`:

```bash
gh aw audit diff 12345678 12345679
gh aw audit diff 12345678 12345679 --format markdown
```

For trends across multiple runs, use `audit report`:

```bash
gh aw audit report --workflow "my-workflow" --last 10
```

See [Audit Commands](/gh-aw/reference/audit/) for complete flag documentation.

### Analyzing Workflow Logs

`gh aw logs` downloads and analyzes logs across multiple runs with tool usage, network patterns, errors, and warnings:
Expand Down