Skip to content

Commit bda5c8f

Browse files
committed
feat(cli): harden review and execution gates
Make the review pipeline config-driven with ordered built-in passes and Review Artifact v3.\n\nAlso harden acceptance execution with nested result resume handling, per-criterion timeout support, generic pass expectations, refreshed docs, and expanded smoke coverage.
1 parent dfc7da9 commit bda5c8f

File tree

11 files changed

+1421
-198
lines changed

11 files changed

+1421
-198
lines changed

.ai/OPERATORS.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ trellis exec my-task -p phase1 # run criteria for one phase only
7070
trellis audit my-task # compare spec files vs git diff
7171
trellis audit my-task -b main # audit against specific base ref
7272
trellis diff my-task # show git history for spec
73-
trellis review my-task # run automated passes + adversarial review prompt
73+
trellis review my-task # run configured automated passes + scaffold Review Artifact v3
7474
trellis complete my-task # read review, record verdict, archive (requires review)
7575
trellis complete my-task --human-reviewed --reason "manual audit" # exceptional audited override when the review gate is blocked
7676
trellis fail my-task # active/ -> archive/ (failed)
@@ -108,14 +108,14 @@ After execution, before completing:
108108

109109
```bash
110110
trellis review my-task # runs automated passes, scaffolds adversarial review
111-
# reviewer fills in findings + review provenance in .ai/reviews/my-task.md
111+
# reviewer fills in findings + Review Artifact v3 metadata in .ai/reviews/my-task.md
112112
trellis complete my-task # reads review, records verdict, archives
113113
# refuses if the latest review round is missing, malformed, incomplete, or failed
114114
trellis complete my-task --human-reviewed --reason "manual audit"
115115
# exceptional audited override; requires interactive confirmation
116116
```
117117

118-
Review rounds accumulate — each `trellis review` appends a numbered section with a fixed metadata block. Prior rounds provide context for subsequent reviewers and make review provenance visible.
118+
Review rounds accumulate — each `trellis review` appends a numbered Review Artifact v3 section with per-pass `pass_results`. The default five-layer pipeline is `spec_compliance`, `scope_drift`, `regression_hunt`, `convention_check`, and `dark_patterns`, ordered by explicit `order` fields in `.ai/config.yaml`. Prior rounds provide context for subsequent reviewers and make review provenance visible.
119119

120120
---
121121

.ai/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,13 @@ Trellis is a spec-driven framework for AI agent task planning and execution. Eve
1111
1. **Plan:** AI generates a task spec in `.ai/specs/drafts/` via conversational ReAct loop
1212
2. **Approve:** Developer reviews and moves spec to `.ai/specs/approved/`
1313
3. **Execute:** AI picks up the approved spec, executes phases, validates at each checkpoint
14-
4. **Review:** Adversarial review finds what execution missed — `trellis review` runs automated passes, scaffolds a machine-validated review artifact, and records review provenance in the latest round
14+
4. **Review:** Adversarial review finds what execution missed — `trellis review` runs the configured `spec_compliance` and `scope_drift` checks, scaffolds Review Artifact v3, and prepares the adversarial `regression_hunt`, `convention_check`, and `dark_patterns` passes in the latest round
1515
5. **Archive:** Completed specs move to `.ai/specs/archive/YYYY-MM/` with truthful review results recorded, or a human-reviewed override audited explicitly when the gate is blocked
1616

1717
The approval gate is the human oversight boundary. The review gate is the quality boundary. During execution, the agent operates autonomously through all phases, pausing only when blocked or deviating from the spec. A normal completion path still stays agent-driven; the human-reviewed override is an exceptional audited escape hatch, not the default workflow.
1818

19+
The default review topology lives in `config.yaml` and uses five ordered built-in passes: `spec_compliance`, `scope_drift`, `regression_hunt`, `convention_check`, and `dark_patterns`. Review Artifact v3 stores per-pass `pass_results`, reviewer provenance, and round status for that configured topology.
20+
1921
---
2022

2123
## Directory Structure

.ai/config.yaml

Lines changed: 25 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -180,25 +180,31 @@ rubric:
180180
# Every spec gets the same review — no profiles.
181181
# Recommended: run the agent review in a fresh context/session.
182182
review:
183-
# Automated passes (run by CLI during trellis review)
184-
passes:
185-
- id: spec_compliance
186-
type: command
187-
description: "Re-run acceptance criteria to verify code satisfies spec"
188-
command: "trellis exec {task_id} --resume"
189-
190-
- id: scope_drift
191-
type: command
192-
description: "Compare spec scope vs actual git diff — flag undeclared changes"
193-
command: "trellis audit {task_id}"
194-
195-
# Agent review — single adversarial pass covering three attack vectors:
196-
# 1. Regression hunt: check callers/imports of modified files for breakage
197-
# 2. Convention check: verify new code follows CONVENTIONS.md and AGENTS.md
198-
# 3. Dark patterns: hunt for hardcoded values, off-by-ones, race conditions, copy-paste errors
199-
# Prompt template: .ai/prompts/review.md
200-
# Findings written to: .ai/reviews/{task-id}.md
201-
agent_review: true
183+
# Review pipeline is built from named built-in passes only.
184+
# Ordering is explicit; Trellis sorts by `order`, not mapping insertion luck.
185+
automated_passes:
186+
spec_compliance:
187+
order: 10
188+
title: "Spec Compliance"
189+
description: "Re-run acceptance criteria to verify code satisfies the spec"
190+
scope_drift:
191+
order: 20
192+
title: "Scope Drift"
193+
description: "Compare spec scope vs actual git diff and flag undeclared changes"
194+
195+
adversarial_passes:
196+
regression_hunt:
197+
order: 30
198+
title: "Regression Hunt"
199+
description: "Trace callers, importers, and downstream consumers for regressions"
200+
convention_check:
201+
order: 40
202+
title: "Convention Check"
203+
description: "Check changed code against CONVENTIONS.md and AGENTS.md"
204+
dark_patterns:
205+
order: 50
206+
title: "Dark Patterns"
207+
description: "Hunt for subtle bugs, hardcodes, races, and safety gaps"
202208

203209
# =============================================================================
204210
# REACT PATTERN (reasoning + acting)

.ai/prompts/exec.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,8 @@ After all phases complete and before `trellis complete`:
167167
5. Fix any blocking findings if needed
168168
6. Run `trellis complete <task-id>` — reads the review, records verdict, archives
169169

170+
The default Review Artifact v3 pipeline is `spec_compliance`, `scope_drift`, `regression_hunt`, `convention_check`, and `dark_patterns`. `trellis review` scaffolds the adversarial sections in configured order and expects the reviewer to update `round_status` plus per-pass `pass_results` before completion.
171+
170172
`trellis complete` will **refuse to archive** if the latest review round is missing, malformed, incomplete, or failed. The only bypass is the exceptional human path: `trellis complete <task-id> --human-reviewed --reason "<why>"`, which requires interactive confirmation and records an audited override.
171173

172174
---

.ai/prompts/review.md

Lines changed: 47 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
## Mission
1010

11-
Find what's wrong. Not what's right — what's wrong.
11+
Find what is wrong. Not what is right.
1212

1313
You are reviewing changes made during spec execution. A separate agent built this, or you did in a prior session. Either way, your job is to attack it.
1414

@@ -31,14 +31,31 @@ A review that finds zero issues is suspicious. Look harder.
3131
2. Read the git diff of all changes
3232
3. Read `CONVENTIONS.md` and `AGENTS.md`
3333
4. Read `.ai/reviews/{task-id}.md` — if prior review rounds exist, read what was found before. Don't re-report fixed issues. Note if a prior finding persists.
34-
5. Attack the diff through the three vectors below — **all three are required**
35-
6. Write findings into the latest review section in `.ai/reviews/{task-id}.md` — each adversarial section must have content or `trellis complete` will reject. Update the review provenance metadata for the reviewer who actually performed the review.
34+
5. Attack the diff through the configured adversarial passes — by default: `regression_hunt`, `convention_check`, and `dark_patterns`
35+
6. Write findings into the latest review section in `.ai/reviews/{task-id}.md`
36+
7. Update the Review Artifact v3 metadata so the latest round is truthful and complete
37+
38+
---
39+
40+
## Default Review Pipeline
41+
42+
The default built-in five-pass pipeline in `.ai/config.yaml` is:
43+
44+
- `spec_compliance`
45+
- `scope_drift`
46+
- `regression_hunt`
47+
- `convention_check`
48+
- `dark_patterns`
49+
50+
`trellis review` already runs `spec_compliance` and `scope_drift` and scaffolds the adversarial sections in configured order. Your job is to complete the adversarial passes and finalize the metadata for Review Artifact v3.
51+
52+
If the project has changed pass titles in `.ai/config.yaml`, follow the headings already scaffolded by `trellis review`. The built-in pass ids stay the same even if the visible section title changes.
3653

3754
---
3855

3956
## Attack Vectors
4057

41-
### 1. Regression Hunt
58+
### 1. Regression Hunt (`regression_hunt`)
4259

4360
For each modified file, find every caller, importer, and downstream consumer. What assumptions do they make that this change violates?
4461

@@ -48,15 +65,15 @@ For each modified file, find every caller, importer, and downstream consumer. Wh
4865
- Verify event listeners and subscribers still match event shapes
4966
- Check if removed or renamed exports are still referenced elsewhere
5067

51-
### 2. Convention Violations
68+
### 2. Convention Check (`convention_check`)
5269

5370
Read `CONVENTIONS.md` and `AGENTS.md`. For each changed file, check whether the new code violates a documented rule.
5471

5572
- Cite the specific convention and the specific violating line
5673
- Don't flag style preferences — only documented, stated conventions
5774
- Check naming patterns, layer boundaries, import rules, test patterns
5875

59-
### 3. Defect Scan
76+
### 3. Dark Patterns (`dark_patterns`)
6077

6178
For each change, actively hunt for:
6279

@@ -81,30 +98,36 @@ For each change, actively hunt for:
8198

8299
## Output
83100

84-
`trellis review` scaffolds the review file at `.ai/reviews/{task-id}.md` with numbered review sections. Fill in the latest section using the fixed Review Artifact v2 contract:
101+
`trellis review` scaffolds the review file at `.ai/reviews/{task-id}.md` with numbered review sections. Fill in the latest section using the Review Artifact v3 contract:
85102

86103
````markdown
87104
## Review N — {timestamp}
88105

89106
### Metadata
90107
```json
91108
{
92-
"schema_version": 2,
109+
"schema_version": 3,
93110
"round_status": "completed",
94111
"reviewer_mode": "fresh_agent",
95112
"reviewer_session": "session-id-or-empty-string",
96113
"reviewed_at": "{timestamp}",
97114
"override_reason": null,
98-
"automated_passes": {
115+
"pass_results": {
99116
"spec_compliance": "pass",
100-
"scope_drift": "pass"
117+
"scope_drift": "pass",
118+
"regression_hunt": "pass",
119+
"convention_check": "pass",
120+
"dark_patterns": "pass"
101121
}
102122
}
103123
```
104124

105-
### Automated Passes
125+
### Pass Results
106126
- spec_compliance: PASS
107127
- scope_drift: PASS
128+
- regression_hunt: PASS
129+
- convention_check: PASS
130+
- dark_patterns: PASS
108131

109132
### Regression Hunt
110133
{For each modified file, trace callers/importers. What assumptions break?
@@ -114,9 +137,9 @@ List findings or "No issues found — checked [what you checked]".}
114137
{Read CONVENTIONS.md and AGENTS.md. Does new code violate any documented rule?
115138
List findings or "No issues found — checked [what you checked]".}
116139

117-
### Defect Scan
118-
{Hunt for hardcoded values, off-by-one, missing null checks, race conditions,
119-
copy-paste errors, unhandled error paths, security issues.
140+
### Dark Patterns
141+
{Hunt for hardcoded values, off-by-one issues, missing null checks, race conditions,
142+
copy-paste errors, unhandled error paths, and security issues.
120143
List findings or "No issues found — checked [what you checked]".}
121144

122145
### Blocking
@@ -129,9 +152,17 @@ List findings or "No issues found — checked [what you checked]".}
129152
{pass | fail | pass_with_issues}
130153
````
131154

132-
Set `reviewer_mode` to `fresh_agent`, `auto`, or `executor` to match the real reviewer. Leave `override_reason` as `null` for normal reviews. Prior review rounds remain in the file as context. Don't modify them — only fill in the latest section.
155+
Update these metadata fields explicitly:
156+
157+
- Set `round_status` to `completed` when the review is actually done
158+
- Set `reviewer_mode` to `fresh_agent`, `auto`, or `executor` to match the real reviewer
159+
- Set `reviewer_session` to the real session identifier or `""`
160+
- Keep the automated pass results for `spec_compliance` and `scope_drift`
161+
- Set adversarial `pass_results` for `regression_hunt`, `convention_check`, and `dark_patterns` to `pass`, `pass_with_issues`, or `fail`
162+
163+
Prior review rounds remain in the file as context. Do not rewrite them.
133164

134-
**All three adversarial sections (Regression Hunt, Convention Check, Defect Scan) must contain content.** Each must have at least one finding or an explicit "No issues found" with a brief note of what was checked. `trellis complete` will reject reviews with empty adversarial sections.
165+
**All configured adversarial sections must contain content.** Each must have at least one finding or an explicit "No issues found" with a brief note of what was checked. `trellis complete` will reject reviews with empty configured sections or with `round_status` left at `in_progress`.
135166

136167
**Verdict rules:** Any blocking finding → `fail`. Non-blocking only → `pass_with_issues`. Clean → `pass`.
137168

.ai/schemas/spec.json

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -223,7 +223,12 @@
223223
"description": {"type": "string"},
224224
"command": {"type": "string"},
225225
"expected": {"type": "string"},
226-
"cwd": {"type": "string", "description": "Working directory relative to workspace root"}
226+
"cwd": {"type": "string", "description": "Working directory relative to workspace root"},
227+
"timeout_seconds": {
228+
"type": "integer",
229+
"minimum": 1,
230+
"description": "Command timeout in seconds. Defaults to 600 when omitted."
231+
}
227232
}
228233
}
229234
}
@@ -327,10 +332,38 @@
327332
"type": "string",
328333
"description": "Working directory for the command, relative to workspace root. Useful in monorepo/workspace setups where different criteria target different submodules."
329334
},
335+
"timeout_seconds": {
336+
"type": "integer",
337+
"minimum": 1,
338+
"description": "Command timeout in seconds. Defaults to 600 when omitted."
339+
},
330340
"result": {
331-
"type": "string",
332-
"enum": ["pass", "fail"],
333-
"description": "Result recorded by trellis exec"
341+
"oneOf": [
342+
{
343+
"type": "string",
344+
"enum": ["pass", "fail"],
345+
"description": "Flat result recorded by trellis exec"
346+
},
347+
{
348+
"type": "object",
349+
"required": ["status"],
350+
"properties": {
351+
"status": {
352+
"type": "string",
353+
"enum": ["pass", "fail"]
354+
},
355+
"timestamp": {
356+
"type": "string",
357+
"format": "date-time"
358+
},
359+
"output": {
360+
"type": "string"
361+
}
362+
},
363+
"additionalProperties": false,
364+
"description": "Nested result block supported for execution records"
365+
}
366+
]
334367
},
335368
"executed_at": {
336369
"type": "string",

0 commit comments

Comments
 (0)