Skip to content

Commit a6dcbc5

Browse files
committed
feat: Enhance validation and normalization of run artifacts, including event logs and background job records
1 parent 53307ad commit a6dcbc5

7 files changed

Lines changed: 591 additions & 5 deletions

File tree

docs/architecture.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,20 +28,28 @@ Until those conditions are met, treat the 9-node workflow as fixed.
2828

2929
- TUI (`autolabos`) and local web ops UI (`autolabos web`) share the same interaction/runtime layer.
3030
- Node execution and transitions are controlled by `StateGraphRuntime`.
31+
- Runtime events are persisted per run in `.autolabos/runs/<run-id>/events.jsonl`; high-churn telemetry should go there rather than into `runs.json`.
32+
- Deferred `collect_papers` recovery state is persisted in `.autolabos/runs/<run-id>/collect_background_job.json` whenever background enrichment is active, so restart recovery stays inspectable.
3133
- Approval mode and transition recommendation behavior are part of runtime contracts.
34+
- `/approve` must respect stored non-advance pending transitions (for example `analyze_results -> backtrack_to_design`) instead of advancing by graph order. Explicit manual `/agent run <next-node>` handoffs may resume `pause_for_human` transitions without weakening default approval behavior.
3235

3336
Harness and runtime work must preserve both TUI and web behaviors unless a change is explicitly requested.
3437

3538
## 3) Artifact model
3639

3740
- Run-scoped source of truth: `.autolabos/runs/<run-id>/...`
41+
- Lightweight run index/projection: `.autolabos/runs/runs.json` (status, node pointer, aggregate `usage`)
3842
- Public mirrored outputs: `outputs/` (single latest-run public bundle)
3943
- Checkpoints and run context are persisted under each run directory.
44+
- Design/execution experiment contracts live in `experiment_portfolio.json` and `run_manifest.json`.
45+
- Transition/gate decisions remain inspectable through artifacts such as `transition_recommendation.json`, `analysis/evidence_scale_assessment.json`, `review/*`, and `paper/write_paper_eligibility.json`.
4046

4147
Quality checks should be deterministic and file-based whenever possible.
4248

4349
Public-facing outputs must remain traceable to underlying run artifacts.
4450

51+
Because events, checkpoints, background-job recovery, and execution artifacts already live in per-run files, `runs.json` should stay a summary index rather than a sink for append-only logs. If index write contention becomes material, split the summary index or move it to sqlite instead of pushing more high-volume data into `runs.json`.
52+
4553
## 4) Node-internal loops are bounded
4654

4755
Internal control loops inside nodes are allowed and expected, including loops in analysis, design, implementation, execution, result interpretation, and writing.

docs/reproducibility.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,24 @@ Reproducibility claims must be backed by concrete artifacts.
44

55
## 1) Minimum artifact set (when applicable)
66

7+
- Runtime event trace (`events.jsonl`)
8+
- Deferred background recovery record when used (`collect_background_job.json`)
79
- Planned portfolio / trial-group structure (`experiment_portfolio.json`)
810
- Run manifest (`run_manifest.json`)
911
- Raw or summarized metrics (`metrics.json`, supplemental metrics)
1012
- Objective evaluation (`objective_evaluation.json`)
1113
- Result synthesis (`result_analysis.json`, optional synthesis artifact)
14+
- Transition decision (`transition_recommendation.json`)
1215
- Paper trace outputs (`paper/main.tex`, `paper/references.bib`, `paper/evidence_links.json`)
1316

1417
## 2) Run-state traceability
1518

1619
For each run, preserve:
1720

1821
- run id
19-
- workflow node progression (`runs.json`)
22+
- workflow node progression (`runs.json`) including current node/status and aggregate usage when available
23+
- append-only runtime events (`events.jsonl`)
24+
- key gate/recovery artifacts (`transition_recommendation.json`, `collect_background_job.json` when present)
2025
- key generated artifacts in `.autolabos/runs/<run_id>/...`
2126

2227
## 3) Reproducibility claim language
@@ -29,12 +34,12 @@ For each run, preserve:
2934
Before marking work complete:
3035

3136
1. Re-run the relevant flow or tests.
32-
2. Confirm expected artifacts are present and parseable.
37+
2. Confirm expected artifacts are present, parseable, and consistent across `runs.json`, `events.jsonl`, and run-scoped artifacts.
3338
3. Record limitations and unresolved uncertainty.
3439

3540
## 5) Validation surfaces
3641

3742
- Runtime diagnostics: `/doctor` in TUI and web Doctor tab (environment + workspace harness checks).
38-
- CI/internal gate: `npm run validate:harness` (issue log format + workspace/test run artifact structure).
43+
- CI/internal gate: `npm run validate:harness` (issue log format + workspace/test run artifact structure, including event logs and portfolio/manifest contracts).
3944

40-
No separate end-user command is required beyond `/doctor`.
45+
No separate end-user command is required beyond `/doctor`, but maintainers should still run `npm run validate:harness` before declaring artifact-level reproducibility complete.

src/core/runs/migrateRuns.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ import {
66
RunsFile
77
} from "../../types.js";
88
import { createDefaultGraphState } from "../stateGraph/defaults.js";
9+
import { normalizeRunUsageSummary } from "./runUsage.js";
910

1011
type StageIdV1 =
1112
| "collect"
@@ -207,6 +208,7 @@ function normalizeRunsV3(file: RunsFile): RunsFile {
207208
version: 3,
208209
workflowVersion: 3,
209210
nodeThreads: run.nodeThreads ?? {},
211+
usage: normalizeRunUsageSummary(run.usage),
210212
graph: {
211213
...createDefaultGraphState(),
212214
...run.graph,

src/core/validation/harnessValidationService.ts

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,15 @@ export function defaultRemediationForIssueCode(code: string): string {
257257
if (code.includes("paper_result")) {
258258
return "Regenerate paper artifacts after run_experiments/analyze_results, or remove unsupported result claims from the manuscript.";
259259
}
260+
if (code.includes("events_log")) {
261+
return "Ensure every started run writes parseable JSONL events to .autolabos/runs/<run-id>/events.jsonl.";
262+
}
263+
if (code.includes("run_manifest") || code.includes("experiment_portfolio")) {
264+
return "Regenerate design/run artifacts so experiment_portfolio.json and run_manifest.json are present and structurally consistent.";
265+
}
266+
if (code.includes("collect_background_job")) {
267+
return "Rewrite or clear collect_background_job.json so deferred enrichment recovery metadata matches the current run.";
268+
}
260269
if (code.includes("review_")) {
261270
return "Align review decision artifacts, run status, and paper output state before marking the run as completed.";
262271
}

0 commit comments

Comments
 (0)