Skip to content

Commit b8e9542

Browse files
authored
feat: add agentic bug sweep workflow (#7)
* feat: scaffold agentic bug sweep workflow * feat: implement agentic bug sweep runner * feat: add remote repo support to bug sweep
1 parent 6f2e2be commit b8e9542

7 files changed

Lines changed: 1891 additions & 0 deletions

README.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,34 @@ Template repository for Rust workspace projects in the tensor4all organization.
1515
2. Add crates to `Cargo.toml` `members`
1616
3. Adjust `coverage-thresholds.json` as needed
1717

18+
## Agentic Bug Sweep
19+
20+
Use `bash scripts/agentic-bug-sweep.sh` to run a bounded headless Codex bug sweep that can create, update, or consolidate GitHub issues.
21+
22+
Requirements:
23+
24+
- `codex`
25+
- `gh`
26+
27+
Primary mode is remote-repository analysis:
28+
29+
```bash
30+
bash scripts/agentic-bug-sweep.sh \
31+
--repo-url https://github.com/tensor4all/tenferro-rs \
32+
--ref main \
33+
--iterations 20 \
34+
--max-consecutive-none 3
35+
```
36+
37+
`--workdir` remains available as a local override when you already have a checked-out repository.
38+
39+
The workflow always stops after the configured `--iterations` limit or after `--max-consecutive-none` dry runs in a row.
40+
41+
Artifacts:
42+
43+
- durable reports: `docs/test-reports/agentic-bug-sweep/`
44+
- ephemeral execution state: `target/agentic-bug-sweep/`
45+
1846
## Coverage
1947

2048
Coverage is checked per-file against thresholds in `coverage-thresholds.json`.

ai/agentic-bug-sweep.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
You are running one iteration of an automated bug sweep for this repository.
2+
3+
Your job on each iteration is to:
4+
5+
1. Inspect open bug issues in the target GitHub repository.
6+
2. Inspect prior bug-sweep reports from `docs/test-reports/agentic-bug-sweep/`.
7+
3. Choose the next unexplored or highest-yield area to investigate.
8+
4. Use the installed `test-feature` skill from `agentic-tests` to investigate that area.
9+
5. Decide whether the result means:
10+
- `create`: a new issue should be created
11+
- `update`: an existing issue should be updated
12+
- `merge`: the finding is the same bug as an existing canonical issue and duplicates should be closed
13+
- `none`: no actionable bug was found
14+
15+
Relationship rules:
16+
17+
- If the finding is the same bug as an existing issue, use `merge`.
18+
- If the finding is not the same bug, but it likely shares the same root cause as an existing issue, keep the primary action as `create` or `update` and populate `related_issue_numbers`.
19+
- Only use `duplicates_to_close` for true duplicates of the same bug.
20+
21+
Output rules:
22+
23+
- Return only JSON that matches the provided schema.
24+
- Always include a short `summary` and the generated `report_path`.
25+
- The schema requires every top-level field to be present. Use `null` for fields that are irrelevant to the chosen action.
26+
- For new issues, provide `issue.title`, `issue.body`, and `issue.labels`.
27+
- For issue updates, provide `canonical_issue_number` and `issue_comment`.
28+
- For duplicate consolidation, provide `canonical_issue_number`, `issue_comment`, `duplicates_to_close`, and `duplicate_comment`.
29+
- If you provide `related_issue_numbers`, also provide `related_comment`.
30+
31+
Do not run raw GitHub issue mutations yourself. The shell script will apply any create, update, merge, or related-issue actions.

ai/agentic-bug-sweep.schema.json

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
{
2+
"$schema": "https://json-schema.org/draft/2020-12/schema",
3+
"type": "object",
4+
"additionalProperties": false,
5+
"properties": {
6+
"summary": {
7+
"type": "string"
8+
},
9+
"report_path": {
10+
"type": "string"
11+
},
12+
"action": {
13+
"type": "string",
14+
"enum": ["create", "update", "merge", "none"]
15+
},
16+
"issue": {
17+
"type": ["object", "null"],
18+
"additionalProperties": false,
19+
"properties": {
20+
"title": {
21+
"type": "string"
22+
},
23+
"body": {
24+
"type": "string"
25+
},
26+
"labels": {
27+
"type": "array",
28+
"items": {
29+
"type": "string"
30+
}
31+
}
32+
},
33+
"required": ["title", "body", "labels"]
34+
},
35+
"canonical_issue_number": {
36+
"type": ["integer", "null"]
37+
},
38+
"related_issue_numbers": {
39+
"type": ["array", "null"],
40+
"items": {
41+
"type": "integer"
42+
}
43+
},
44+
"duplicates_to_close": {
45+
"type": ["array", "null"],
46+
"items": {
47+
"type": "integer"
48+
}
49+
},
50+
"duplicate_comment": {
51+
"type": ["string", "null"]
52+
},
53+
"issue_comment": {
54+
"type": ["string", "null"]
55+
},
56+
"related_comment": {
57+
"type": ["string", "null"]
58+
}
59+
},
60+
"required": [
61+
"summary",
62+
"report_path",
63+
"action",
64+
"issue",
65+
"canonical_issue_number",
66+
"related_issue_numbers",
67+
"duplicates_to_close",
68+
"duplicate_comment",
69+
"issue_comment",
70+
"related_comment"
71+
]
72+
}
Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
# Agentic Bug Sweep Design
2+
3+
## Goal
4+
5+
Add a headless Codex-driven bug sweep workflow that repeatedly runs `agentic-tests`, triages findings against existing GitHub issues, creates or consolidates issues, records related same-root-cause issues, and stops after either a fixed iteration budget or too many consecutive dry runs.
6+
7+
## Scope
8+
9+
This workflow belongs in the repository as a reusable automation surface. It is responsible for:
10+
11+
- running a bounded number of headless Codex iterations
12+
- letting Codex choose the next exploration target autonomously
13+
- persisting reports and machine-readable iteration results
14+
- creating new GitHub issues when a new bug is found
15+
- consolidating findings into existing issues when the bug is already tracked
16+
- recording links to related issues when a finding appears to share the same root cause but is still a distinct symptom
17+
- closing duplicates when a canonical issue is selected
18+
19+
It is not responsible for:
20+
21+
- fixing bugs automatically
22+
- pushing branches or creating pull requests
23+
- continuing forever without an explicit iteration bound
24+
25+
## Recommended Approach
26+
27+
Use a shell script as the orchestration layer and use headless Codex only for the judgment-heavy parts.
28+
29+
The shell script should own deterministic operations:
30+
31+
- iteration counting
32+
- lock handling
33+
- environment checks
34+
- report and log storage
35+
- GitHub mutations through `gh`
36+
37+
Codex should own non-deterministic reasoning:
38+
39+
- choosing the next high-yield exploration area
40+
- interpreting `agentic-tests` results
41+
- deciding whether a finding should create a new issue, update an existing issue, merge into a canonical issue, or produce no action
42+
- identifying existing issues that are probably related because they appear to share the same root cause
43+
44+
This split keeps side effects auditable and makes failures easier to recover from than a single fully autonomous prompt.
45+
46+
## Architecture
47+
48+
The workflow should consist of three checked-in files plus artifact directories:
49+
50+
- `scripts/agentic-bug-sweep.sh`
51+
- `ai/agentic-bug-sweep.md`
52+
- `ai/agentic-bug-sweep.schema.json`
53+
- durable artifacts in `docs/test-reports/agentic-bug-sweep/`
54+
- ephemeral state in `target/agentic-bug-sweep/`
55+
56+
`scripts/agentic-bug-sweep.sh` is the entrypoint. It gathers repository context, invokes `codex exec`, validates the returned JSON, and applies GitHub side effects.
57+
58+
`ai/agentic-bug-sweep.md` is the fixed prompt that tells Codex to inspect prior reports, inspect existing issues, choose the next target, run the installed `agentic-tests` flow, and emit only schema-valid JSON.
59+
60+
`ai/agentic-bug-sweep.schema.json` defines the exact contract that the shell script accepts from Codex.
61+
62+
## Iteration Flow
63+
64+
Each iteration should run in this order:
65+
66+
1. Acquire a lock so only one sweep runs at a time.
67+
2. Snapshot relevant GitHub state such as open bug issues and recent issue metadata.
68+
3. Gather prior sweep reports from `docs/test-reports/agentic-bug-sweep/`.
69+
4. Invoke `codex exec` headlessly with:
70+
- the target repository path
71+
- the fixed prompt file
72+
- the JSON schema
73+
- a durable output file for the last message
74+
5. Validate the returned JSON strictly.
75+
6. Apply the requested GitHub action:
76+
- `create`: create a new issue
77+
- `update`: comment on an existing issue
78+
- `merge`: update the canonical issue, comment on duplicates, and close duplicates
79+
- `none`: record the dry run and do not mutate GitHub
80+
7. Preserve any same-root-cause relationships returned through `related_issue_numbers`.
81+
8. Persist iteration metadata and report references.
82+
9. Update counters and decide whether to continue.
83+
84+
The shell should never let Codex perform raw GitHub mutations directly. Codex returns intent and payload; the shell applies the side effects.
85+
86+
## Stop Policy
87+
88+
The workflow should use two explicit limits:
89+
90+
- `--iterations N`: hard upper bound for total iterations
91+
- `--max-consecutive-none M`: early-stop threshold for consecutive dry runs
92+
93+
Behavior:
94+
95+
- every iteration consumes one unit from `N`
96+
- `action=none` increments the dry-run counter
97+
- any of `create`, `update`, or `merge` resets the dry-run counter to zero
98+
- the workflow stops as soon as either `N` iterations are reached or `M` consecutive `none` results occur
99+
100+
Stop reasons should be recorded explicitly, for example:
101+
102+
- `completed_max_iterations`
103+
- `completed_consecutive_none_threshold`
104+
- `failed_codex_exec`
105+
- `failed_invalid_json`
106+
- `failed_github_mutation`
107+
108+
## JSON Contract
109+
110+
The Codex output should be minimal and action-oriented. Required top-level fields:
111+
112+
```json
113+
{
114+
"summary": "short human-readable iteration summary",
115+
"report_path": "docs/test-reports/bug-sweep-20260308-123456.md",
116+
"action": "create",
117+
"issue": {
118+
"title": "Bug: ...",
119+
"body": "Markdown body",
120+
"labels": ["bug", "prio/p1", "area/einsum"]
121+
},
122+
"canonical_issue_number": 123,
123+
"related_issue_numbers": [140, 141],
124+
"duplicates_to_close": [124, 130],
125+
"duplicate_comment": "Closing in favor of #123 because ...",
126+
"issue_comment": "New evidence from automated sweep: ...",
127+
"related_comment": "This newly discovered bug likely shares the same root cause as #123."
128+
}
129+
```
130+
131+
Contract rules:
132+
133+
- `create`
134+
- requires `issue`
135+
- `update`
136+
- requires `canonical_issue_number`
137+
- requires `issue_comment`
138+
- `merge`
139+
- requires `canonical_issue_number`
140+
- requires `issue_comment`
141+
- requires `duplicates_to_close`
142+
- requires `duplicate_comment`
143+
- `none`
144+
- requires only `summary` and `report_path`
145+
- any non-`none` action may include `related_issue_numbers`
146+
- if `related_issue_numbers` is present and the workflow should notify those issues directly, require `related_comment`
147+
148+
The schema should reject any missing fields for the selected action.
149+
150+
## Issue Consolidation Policy
151+
152+
Consolidation should mean operational unification, not a Git merge.
153+
154+
The workflow should distinguish two cases:
155+
156+
- duplicate or same bug
157+
- use `merge`
158+
- same likely root cause but distinct user-visible bug
159+
- keep the primary action as `create` or `update`
160+
- record the relationship through `related_issue_numbers`
161+
162+
When Codex selects `merge`:
163+
164+
1. Comment on the canonical issue with the new evidence.
165+
2. Comment on each duplicate issue with a pointer to the canonical issue.
166+
3. Close each duplicate issue.
167+
168+
This ordering preserves information even if a later GitHub command fails.
169+
170+
When Codex returns `related_issue_numbers`, the workflow should preserve that relationship in the primary issue body or comment, and may also comment on the related issues when `related_comment` is provided.
171+
172+
## Failure Policy
173+
174+
The workflow should stop on the first hard failure.
175+
176+
- If `codex exec` fails, stop and preserve logs.
177+
- If Codex returns invalid JSON, stop and preserve the raw response.
178+
- If a GitHub mutation fails, stop immediately after recording which mutation failed.
179+
- If report generation succeeds but issue mutation fails, keep the report path and iteration payload so the run can be inspected and resumed manually.
180+
181+
The script should prefer conservative failure over silent continuation after partial side effects.
182+
183+
## File Layout
184+
185+
Durable files:
186+
187+
- `docs/test-reports/agentic-bug-sweep/` for reports, iteration summaries, and any audit trail worth keeping
188+
189+
Ephemeral files:
190+
191+
- `target/agentic-bug-sweep/lock`
192+
- `target/agentic-bug-sweep/context/`
193+
- `target/agentic-bug-sweep/output/`
194+
195+
This split keeps long-lived artifacts in versioned paths and temporary execution state out of the main tree.
196+
197+
## Testing Strategy
198+
199+
Verification should focus on deterministic shell behavior and schema enforcement.
200+
201+
- unit-style tests for argument parsing and stop-condition bookkeeping
202+
- tests that stub `codex exec` output and verify `create`, `update`, `merge`, and `none` branches
203+
- tests that verify related-issue handling for same-root-cause findings
204+
- tests that verify duplicate close ordering
205+
- tests that verify early stop on consecutive `none`
206+
- tests that verify failure on invalid JSON or failed `gh` commands
207+
208+
The headless Codex behavior itself should be validated by schema conformance and by preserving raw iteration outputs for inspection.
209+
210+
## Non-Goals
211+
212+
- automatic bug fixing
213+
- automatic branch creation or PR creation
214+
- unbounded autonomous exploration
215+
- opaque direct GitHub mutations from inside Codex prompts
216+
217+
## Recommended Next Step
218+
219+
Implement the shell orchestrator, the fixed prompt, and the schema together. Then add tests that stub `codex exec` and `gh` so the control flow can be validated without making live network mutations.

0 commit comments

Comments
 (0)