bugatti-cli/progress.txt at main · codesoda/bugatti-cli · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
## Codebase Patterns
- CLI is built with clap derive macros; subcommands go in `src/cli.rs`
- Module layout: `main.rs` (entry), `cli.rs` (clap definitions), `lib.rs` (module re-exports)
- Tests for CLI parsing use `Cli::parse_from` with synthetic args
- Config structs use `#[serde(deny_unknown_fields)]` for strict validation
- Use `tempfile::tempdir()` for filesystem-dependent tests
- BTreeMap for deterministic key ordering in config maps
- Test file types live in `src/test_file.rs`; steps use flat struct with Option fields validated post-parse
- Step validation: exactly one of instruction/include_path/include_glob must be set per step
- Use `effective_config()` from config module to merge test overrides before execution
- `ProviderConfig::merge_overrides()` handles field-level Option merging (Some replaces, None preserves)
- Short-lived command execution lives in `src/command.rs`; use `run_short_lived_commands()` during setup phase
- Commands are run via `sh -c` for shell expansion; stdout/stderr captured to `logs/<name>.stdout.log` / `logs/<name>.stderr.log`
- Run identity and artifacts live in `src/run.rs`; use `initialize_run()` to set up a run
- `ArtifactDir::from_run_id()` computes deterministic artifact paths from project root + run ID
- Run metadata is written as JSON to `.bugatti/runs/<run-id>/run_metadata.json`
- Long-lived commands: use `spawn_long_lived_commands()` to start background processes; returns `Vec<TrackedProcess>`
- Use `check_for_unexpected_exits()` to detect crashed long-lived processes during run
- Use `teardown_processes()` to SIGTERM all tracked processes at run end
- Readiness checks use `curl -sf` subprocess; polls every 500ms with 30s timeout
- `libc` crate is used for SIGTERM on unix; `child.kill()` fallback on non-unix
- make sure you reload prd.json everytime
- never use curl for readiness checks, always load readiness checks with reqwest helpers to ensure cross platform
- Provider trait lives in `src/provider.rs`; `AgentSession` trait defines the provider-agnostic interface
- Provider messages carry `RunId`, `SessionId`, and `step_id` for traceability
- Streaming uses `Box<dyn Iterator<Item = Result<OutputChunk, ProviderError>>>` (synchronous iterator pattern)
- Claude Code adapter lives in `src/claude_code.rs`; `ClaudeCodeAdapter` implements `AgentSession`
- Claude CLI invoked via `claude -p <msg> --session-id <id> --output-format stream-json` for each message
- `which` crate resolves the `claude` binary path at initialization time
- Stream-json output is JSONL; events have `type` field: "assistant", "result", "error"
- Mock subprocess tests use `sh -c 'echo ...'` to simulate CLI JSON output
- Step execution lives in `src/executor.rs`; `execute_steps()` runs expanded steps sequentially in one session
- `parse_result_marker()` scans output lines in reverse for RESULT OK/WARN/ERROR contract markers
- Executor stops on first hard failure (ERROR/timeout/protocol error); WARN is a pass and continues
- Transcript artifacts: per-step `step_NNNN.txt` + combined `full_transcript.txt` under transcripts/
- BUGATTI_LOG events: parsed via `parse_log_events()` in executor; stored in `StepOutcome.log_events` and written to `logs/bugatti_log_events.txt`
- Console log rendering format: `LOG ........ <message>`
- Report compilation lives in `src/report.rs`; `compile_report()` generates markdown, `write_report()` writes to disk
- `ReportInput` struct collects all data needed for report generation (run IDs, config summary, outcome, artifact paths)
- BUGATTI_LOG events are included in report only for WARN/ERROR steps, not OK steps
- Report is always written regardless of pass/fail status
- Structured tracing lives in `src/diagnostics.rs`; use `init_tracing()` to set up file-based tracing, returns `TracingGuard`
- Tracing writes JSONL to `diagnostics/harness_trace.jsonl` under the run's artifact directory
- `ArtifactKind` enum in diagnostics module: HarnessDiagnostics, AgentLogs, Transcript, Evidence, Report
- `collect_artifact_refs()` gathers all artifact references from a completed run
- Harness tracing events (via `tracing::info!/warn!/error!`) are distinct from BUGATTI_LOG events in storage
- Use `tracing::info!` with structured fields (component, step_id, command, etc.) for harness lifecycle events
- Evidence types live in `src/diagnostics.rs`: `EvidenceKind` enum (Screenshot, CommandLog, BrowserConsole, NetworkFailure, SqlCliEvidence) and `EvidenceRef` struct
- `EvidenceRef.collection_error` carries error message when evidence collection fails; `is_available()` checks both path existence and no error
- Evidence references are stored per-step in `StepOutcome.evidence_refs`; report renders them only for WARN/ERROR steps
- `collect_artifact_refs()` scans screenshots/ dir for evidence files and includes them as `ArtifactKind::Evidence`
- Discovery lives in `src/discovery.rs`; `discover_root_tests()` recursively finds *.test.toml files excluding hidden dirs and include_only files
- `DiscoveryResult` separates successful tests from parse errors for per-file error reporting
- Multi-test aggregate summary uses `print_aggregate_summary()` in main.rs with PASS/FAIL/ERROR per test
- Exit codes live in `src/exit_code.rs`; constants: EXIT_OK(0), EXIT_STEP_ERROR(1), EXIT_CONFIG_ERROR(2), EXIT_PROVIDER_ERROR(3), EXIT_TIMEOUT(4), EXIT_INTERRUPTED(5)
- `exit_code_for_run()` computes exit code from a `RunOutcome`; `aggregate_exit_code()` combines multiple runs
- Ctrl+C handler uses `ctrlc` crate with global `AtomicBool` flag; checked between test runs in discovery mode
- Exit codes documented in CLI `after_help` text via clap attribute

---

## 2026-03-27 - US-001
- Scaffolded Rust CLI project with clap, serde, toml dependencies
- Created `bugatti test [path]` subcommand with optional path argument
- Files changed: Cargo.toml, Cargo.lock, src/main.rs, src/cli.rs, src/lib.rs
- **Learnings for future iterations:**
  - clap derive with `#[command(subcommand)]` is the pattern for subcommands
  - Optional positional args use `Option<String>` in clap derive structs
  - Tests live in `src/main.rs` `#[cfg(test)]` module for now; may need a `tests/` dir later
---

## 2026-03-27 - US-002
- Defined Config, ProviderConfig, CommandDef, CommandKind types with serde deserialization
- Implemented `load_config()` that reads bugatti.config.toml from a directory, returns defaults if missing
- Used `#[serde(deny_unknown_fields)]` to reject unknown fields with clear errors
- Used BTreeMap for commands to ensure deterministic ordering
- Added tempfile as dev-dependency for test isolation
- Files changed: src/config.rs (new), src/lib.rs, Cargo.toml, Cargo.lock
- **Learnings for future iterations:**
  - `#[serde(deny_unknown_fields)]` catches typos and invalid config fields at parse time
  - `#[serde(rename_all = "snake_case")]` works well for enum variants (e.g., `short_lived` in TOML)
  - `tempfile::tempdir()` is the pattern for filesystem-dependent tests
  - clippy prefers `#[derive(Default)]` over manual `impl Default` when all fields have defaults
---

## 2026-03-27 - US-003
- Defined TestFile, TestOverrides, ProviderOverrides, Step types with serde deserialization
- Implemented `parse_test_file()` that reads and validates *.test.toml files
- Steps use flat struct with Option fields; post-parse validation ensures exactly one variant is set
- TestFileError includes file path in all error messages for clear diagnostics
- Files changed: src/test_file.rs (new), src/lib.rs
- **Learnings for future iterations:**
  - Flat struct with Option fields + post-parse validation is simpler than serde untagged enums for step variants
  - `deny_unknown_fields` on all structs catches schema drift early
  - Error types should always carry the file path for user-facing messages
---

## 2026-03-27 - US-004
- Added `ProviderConfig::merge_overrides()` method for field-level override merging
- Added `effective_config()` function that computes merged config from global + test file overrides
- Override fields that are `Some` replace global values; `None` preserves global values
- Commands map is always preserved from global config (not overridable per-test)
- Files changed: src/config.rs
- **Learnings for future iterations:**
  - Cross-module imports work naturally: config.rs imports from test_file.rs via `crate::test_file`
  - The `effective_config` should be computed before any execution begins (per acceptance criteria)
  - Test structs can be constructed directly in unit tests without going through TOML parsing
---

## 2026-03-27 - US-006
- Created `src/run.rs` with RunId, SessionId (UUID v4), ArtifactDir, RunMetadata types
- `initialize_run()` generates IDs, creates `.bugatti/runs/<run-id>/` with transcripts/, screenshots/, logs/, diagnostics/ subdirs
- Writes run_metadata.json with run ID, session ID, test file path, project root, provider name, start time, effective config summary
- Added chrono, serde_json, uuid dependencies
- Files changed: src/run.rs (new), src/lib.rs, Cargo.toml, Cargo.lock
- **Learnings for future iterations:**
  - `chrono::DateTime<Utc>` with `serde` feature serializes to ISO 8601 by default
  - `uuid::Uuid::new_v4()` for unique run/session identifiers
  - `std::fs::create_dir_all` is idempotent and handles nested directory creation
  - `serde_json::to_string_pretty` for human-readable metadata files
---

## 2026-03-27 - US-007
- Implemented short-lived command execution with output capture and fail-fast behavior
- Commands run via `sh -c` for shell expansion support
- stdout/stderr captured to `logs/<name>.stdout.log` and `logs/<name>.stderr.log`
- Non-zero exit stops execution immediately with clear error including stderr path
- `skip_cmds` parameter allows skipping specific commands (for future --skip-cmd CLI flag)
- Console output shows RUN/OK/SKIP status for each command
- Files changed: src/command.rs (new), src/lib.rs
- **Learnings for future iterations:**
  - `std::process::Command::new("sh").arg("-c").arg(&cmd)` is the pattern for running shell commands with expansion
  - `output.status.code()` returns `Option<i32>` (None if killed by signal)
  - BTreeMap iteration order guarantees deterministic command execution order
  - Tests use `tempfile::tempdir()` + `ArtifactDir::from_run_id()` + `create_all()` to set up artifact directories
---

## 2026-03-27 - US-008
- Implemented long-lived subprocess management with readiness checks
- Spawns background processes via `sh -c` with stdout/stderr redirected to log files
- Readiness polling via `curl -sf` subprocess, 500ms interval, 30s timeout
- Unexpected exit detection via `try_wait()` on tracked process handles
- Teardown sends SIGTERM (unix) with 5s grace period, then force-kills
- Added `libc` dependency for SIGTERM signal on unix
- Files changed: src/command.rs, Cargo.toml, Cargo.lock
- **Learnings for future iterations:**
  - `Stdio::from(std::fs::File)` redirects child process output to a file
  - `child.try_wait()` is non-blocking process status check
  - `libc::kill(pid, libc::SIGTERM)` for graceful shutdown on unix
  - clippy requires `&mut [T]` instead of `&mut Vec<T>` for function params
  - Readiness checks should tear down already-started processes on failure
---

## 2026-03-27 - US-009
- Added `--skip-cmd <name>` flag to `bugatti test` subcommand (repeatable via clap `Vec<String>`)
- Added `validate_skip_cmds()` function that checks skip names against known config commands before execution
- Readiness checks for skipped long-lived commands still run by default (user may have the service running externally)
- Console output shows SKIP status for skipped commands, and "(skipped)" annotations on readiness checks
- Files changed: src/cli.rs, src/command.rs, src/main.rs
- **Learnings for future iterations:**
  - clap `Vec<String>` with `#[arg(long = "skip-cmd")]` supports repeated flags naturally
  - Skipped command readiness checks are important when user runs services externally
  - `validate_skip_cmds` should be called before any execution begins, similar to effective_config computation
---

## 2026-03-27 - US-010
- Defined `AgentSession` trait in `src/provider.rs` with methods: initialize, start, send_bootstrap, send_step, close
- Defined supporting types: `StepMessage`, `BootstrapMessage`, `OutputChunk`, `ProviderError`
- Trait receives resolved `Config` (not raw TOML), messages carry `RunId`, `SessionId`, `step_id`
- Uses synchronous iterator pattern for streamed output (`Box<dyn Iterator<Item = Result<OutputChunk, ProviderError>>>`)
- Files changed: src/provider.rs (new), src/lib.rs
- **Learnings for future iterations:**
  - Synchronous iterator pattern avoids async runtime dependency while supporting streaming
  - `ProviderError` enum covers full lifecycle: init, start, send, stream, shutdown, crash
  - `OutputChunk::Done` marker signals end of provider response for a message
---

## 2026-03-27 - US-011
- Implemented `ClaudeCodeAdapter` in `src/claude_code.rs` implementing the `AgentSession` trait
- Launches `claude` CLI subprocess per message with `--session-id` for conversation continuity
- Uses `--output-format stream-json` for JSONL streaming; parses assistant/result/error event types
- Extra system prompt passed via `--system-prompt` flag; agent_args appended to command
- `ClaudeCodeStreamIterator` reads stdout line-by-line, parses JSON events, yields `OutputChunk` values
- Handles process failure (non-zero exit) as `SessionCrashed`, error events as `StreamError`
- Binary resolution via `which` crate at initialization time
- Files changed: src/claude_code.rs (new), src/lib.rs, Cargo.toml, Cargo.lock
- **Learnings for future iterations:**
  - `which::which()` is cleaner than manual PATH search for binary resolution
  - `Box<dyn Iterator + '_>` lifetime ties iterator to the struct borrow, preventing use-after-free
  - Mock subprocess tests with `sh -c 'echo JSON'` effectively test stream parsing without the real CLI
  - `Command::get_args()` enables testing command construction without spawning processes
  - `read_to_string` on child stderr needs `std::io::Read` import (separate from `BufRead`)
---

## 2026-03-27 - US-012
- Implemented step execution loop in `src/executor.rs` with sequential step execution within one provider session
- Each step message includes run ID, session ID, step ID, source provenance, and instruction text
- Streamed provider output captured in transcript artifacts (per-step and combined full_transcript.txt)
- RESULT contract parsing: scans for last RESULT OK / RESULT WARN: ... / RESULT ERROR: ... line
- Missing result marker produces protocol error; step timeout enforced via Instant-based checks
- Stops execution on first failure (ERROR, protocol error, timeout, provider failure); WARN continues
- Added Clone derive to ProviderError for mock test support
- Files changed: src/executor.rs (new), src/lib.rs, src/provider.rs
- **Learnings for future iterations:**
  - `ProviderError` needs `Clone` derive for mock session tests (cloned into iterator responses)
  - Executor module pattern: `execute_steps()` takes `&mut dyn AgentSession` for provider-agnostic execution
  - Result parsing scans lines in reverse — last RESULT marker wins
  - Transcript artifacts: per-step `step_NNNN.txt` + combined `full_transcript.txt`
  - MockSession pattern with pre-loaded response vectors is effective for testing execution flow
---

## 2026-03-27 - US-013
- Implemented BUGATTI_LOG line recognition and LogEvent parsing from provider output
- Added `LogEvent` type with run_id, step_id, and message fields
- `parse_log_events()` extracts BUGATTI_LOG lines from transcript text
- Log events stored in `StepOutcome.log_events`, separate from raw transcript
- Log events written to `logs/bugatti_log_events.txt` (separate from transcript and diagnostics)
- Console output renders log events as `LOG ........ <message>` during execution
- Files changed: src/executor.rs
- **Learnings for future iterations:**
  - Use `trim_start()` not `trim()` when extracting prefixed messages, to preserve trailing content
  - Log events file is only created when events exist (avoid empty artifact files)
  - BUGATTI_LOG lines remain in raw transcript (unfiltered); separate storage is additive
- Console output format: `STEP N/M ... <instruction> (from <file>)` for step begin, `OK/WARN/ERROR ....... <detail> (<duration>)` for step result
- `print_run_summary()` renders final status with counts of ok/warn/failed/skipped steps
- `truncate_instruction()` limits instruction text to first line, max 60 chars for readability
---

## 2026-03-27 - US-014
- Added step begin console output: `STEP N/M ... <instruction> (from <source_file>)`
- Added step result console output: `OK/WARN/ERROR ....... <detail> (<duration>)`
- Added final run summary: `Run PASSED/FAILED: N ok, N warn, N failed, N skipped (Xs)`
- Helper functions: `truncate_instruction()`, `print_step_result()`, `print_run_summary()`
- Files changed: src/executor.rs
- **Learnings for future iterations:**
  - Console output format uses consistent dot-padding pattern matching existing RUN/OK/SKIP/etc.
  - Step begin shows 1-based step number (N/M), instruction summary (truncated), and source file
  - Run summary uses box-drawing chars (═) for visual separation
  - All console output from executor is synchronous println!() — no async or buffering needed
---

## 2026-03-27 - US-015
- Implemented report.md compilation in a separate `src/report.rs` module
- `compile_report()` generates markdown from `ReportInput` struct (run model, not console output)
- Report includes: run ID, session ID, test file, provider, start/end time, duration, pass/fail status
- Report includes: skipped commands section (only when commands were skipped)
- Report includes: ordered step results with OK/WARN/ERROR, per-step transcript paths
- Report includes: BUGATTI_LOG events for WARN/ERROR steps only
- Report includes: effective config summary and artifact directory paths
- `write_report()` writes to `.bugatti/runs/<run-id>/report.md`
- Both successful and failed runs produce report.md
- Files changed: src/report.rs (new), src/lib.rs, prd.json
- **Learnings for future iterations:**
  - `std::fmt::Write` is needed for `writeln!()` on `String` — import as `FmtWrite` to avoid conflict
  - `let _ = writeln!(...)` pattern suppresses infallible write errors for best-effort content
  - Report uses markdown tables for structured data (metadata, config, artifacts)
  - `format_duration()` helper switches between seconds and minutes at 60s threshold
---

## 2026-03-27 - US-017
- Refactored full_transcript.txt to be written incrementally during execution (not reconstructed after)
- File opened at start of `execute_steps()`, each step appended immediately after completion
- Added `artifact_errors: Vec<String>` field to `RunOutcome` to track capture failures
- Per-step transcript write failures now use `tracing::error!` instead of `eprintln!`
- Report now includes full_transcript.txt path in Artifacts table
- Report includes "Artifact Capture Errors" section when capture failures occurred
- Added `artifact_errors` field to `ReportInput` for passing errors to report compiler
- Files changed: src/executor.rs, src/report.rs
- **Learnings for future iterations:**
  - Use a closure returning `std::io::Result<()>` for grouping multiple writes with `?` operator
  - `drop(file)` explicitly flushes/closes the file handle before subsequent operations
  - Adding fields to shared structs (RunOutcome, ReportInput) requires updating all test constructors
  - `sed -i ''` with substitution is effective for bulk-adding fields to many test struct constructions
---

## 2026-03-27 - US-018
- Added `EvidenceKind` enum with typed variants: Screenshot, CommandLog, BrowserConsole, NetworkFailure, SqlCliEvidence
- Added `EvidenceRef` struct with kind, path, description, and optional `collection_error` field
- Added `evidence_refs: Vec<EvidenceRef>` field to `StepOutcome` for per-step evidence tracking
- Updated report to render evidence references for WARN/ERROR steps (hidden for OK steps)
- Failed evidence collection noted in report via `collection_error` field
- Updated `collect_artifact_refs()` to scan screenshots/ directory for evidence files
- Files changed: src/diagnostics.rs, src/executor.rs, src/report.rs
- **Learnings for future iterations:**
  - Evidence references use durable paths under run directory, never inline payloads
  - `EvidenceRef::is_available()` checks both file existence and no collection error
  - Adding a field to `StepOutcome` requires updating all test constructors across modules (report.rs, executor.rs)
  - Report evidence rendering follows same pattern as BUGATTI_LOG: only for non-OK steps
---

## 2026-03-27 - US-019
- Implemented root test discovery in new `src/discovery.rs` module
- `discover_root_tests()` recursively finds *.test.toml files under project root
- Hidden directories (starting with `.`) are skipped (prevents scanning .bugatti, .git, etc.)
- Files marked `include_only = true` are excluded from discovery results
- Discovery order is deterministic: files and directories sorted by path
- Parse errors collected per-file in `DiscoveryResult.errors` — discovery continues past individual failures
- `DiscoveredTest` struct carries path and parsed name for display
- Updated main.rs: no-arg `bugatti test` now discovers and runs all root tests
- Aggregate summary prints PASS/FAIL/ERROR status for each test with totals
- Exit code is non-zero if any test fails or has parse errors
- Files changed: src/discovery.rs (new), src/lib.rs, src/main.rs
- **Learnings for future iterations:**
  - Recursive directory traversal with `read_dir` + sorted subdirectories ensures deterministic order
  - Separating parse errors from successful tests allows reporting all errors before execution
  - `collect_test_files()` helper keeps recursion logic separate from parse/filter logic
  - Hidden directory skip prevents scanning .bugatti artifact directories (which could contain test-like files)
---

## 2026-03-27 - US-020
- Implemented stable exit codes in new `src/exit_code.rs` module
- Exit codes: 0 (all pass), 1 (step error), 2 (config/parse error), 3 (provider/readiness), 4 (timeout), 5 (interrupted)
- `exit_code_for_run()` computes code from RunOutcome; `aggregate_exit_code()` for multi-test mode
- Added Ctrl+C handler using `ctrlc` crate with global AtomicBool flag
- Interrupted runs skip remaining tests with EXIT_INTERRUPTED code
- Updated main.rs: `run_discovery()` now returns i32 exit code; `main()` calls `std::process::exit(code)`
- `TestRunResult` now carries `exit_code`, `run_id`, and `report_path` fields
- Aggregate summary distinguishes PASS/FAIL/SKIP status; prints run references after summary
- Exit codes documented in CLI help via `after_help` clap attribute
- Finalization order: record status -> flush artifacts -> stop subprocesses -> print summary -> exit
- Files changed: src/exit_code.rs (new), src/main.rs, src/cli.rs, src/lib.rs, Cargo.toml
- **Learnings for future iterations:**
  - `ctrlc` crate is simpler than raw signal handling; uses `AtomicBool` for cross-thread safety
  - `after_help` in clap command attribute is ideal for documenting exit codes
  - `aggregate_exit_code()` takes max of all run codes to propagate most severe failure
  - Return exit code as i32 from functions rather than calling `process::exit` inline for testability
- Full pipeline order in main.rs: config -> parse -> effective_config -> validate_skip_cmds -> expand -> initialize_run -> init_tracing -> short-lived cmds -> long-lived cmds -> provider init -> execute_steps -> close -> teardown -> report -> exit
- Integration tests live in `tests/pipeline_integration.rs`; use mock `AgentSession` impl for provider-free testing
- `write_best_effort_report()` in main.rs centralizes report writing with error logging (always attempt, never fatal)
---

## 2026-03-27 - US-021
- Wired the full end-to-end pipeline in main.rs: config load -> parse -> expand -> artifact setup -> tracing -> command setup -> provider init -> step execution -> report -> teardown -> exit
- Single-file mode (`bugatti test <path>`) now runs the complete pipeline instead of placeholder
- Discovery mode (`bugatti test`) now passes skip_cmds through to each test run
- Every error phase produces best-effort report writing and subprocess teardown
- `write_best_effort_report()` helper centralizes report writing with error logging
- `run_test_pipeline()` handles pre-artifact failures; `run_test_with_artifacts()` handles post-artifact phases
- Created integration test suite (`tests/pipeline_integration.rs`) with 6 tests:
  - `full_pipeline_with_mock_provider`: exercises the complete pipeline with a mock provider
  - `pipeline_fails_on_invalid_config`: config errors fail before execution
  - `pipeline_fails_on_invalid_test_file`: parse errors fail before execution
  - `pipeline_fails_on_cycle`: cycle detection fails before execution
  - `pipeline_with_error_step_exits_nonzero`: ERROR steps produce non-zero exit
  - `report_generated_for_failed_run`: reports are generated for failed runs
- Files changed: src/main.rs, tests/pipeline_integration.rs (new), prd.json
- **Learnings for future iterations:**
  - Integration tests can use a mock `AgentSession` impl to test the pipeline without a real provider
  - `chrono::Utc::now()` and `to_rfc3339()` are used for report timestamps
  - Best-effort report pattern: always attempt to write, log failures, continue with cleanup
  - `#[allow(clippy::too_many_arguments)]` for functions that genuinely need many parameters
---

## 2026-03-27 - US-005
- Verified existing implementation of include step expansion with cycle detection in `src/expand.rs`
- All acceptance criteria already met: single include, glob, nested includes, direct/indirect cycle detection, provenance tracking, sequential step IDs
- Module was already integrated in main.rs pipeline and used by integration tests
- All 156 tests pass, typecheck clean — marked US-005 as passes: true
- Files changed: prd.json (marked passes: true), progress.txt
- **Learnings for future iterations:**
  - The expand module uses `visited.remove()` after recursion to allow diamond-shaped includes (A->B, A->C, B->D, C->D) while still detecting cycles
  - `glob` crate is used for pattern matching; results are sorted for deterministic order
  - `canonicalize()` is used for cycle detection to handle symlinks and relative paths correctly
---