Feature/autoresearch by justindobbs · Pull Request #85 · justindobbs/Tracecore

justindobbs · 2026-03-10T19:25:09Z

Summary

What problem does this PR solve?
How does it solve it (major bullets)?

Testing

python -m pytest
python -m ruff check agent_bench
Additional targeted tests (list):

Checklist

Spec/docs updated (README, SPEC_FREEZE, changelog, etc., as needed)
New tasks/tests added to registry + SPEC if applicable
Roadmap items tracked in appropriate boards (if applicable)
Security/privacy review (if touching telemetry, signing, or bundles)
Verified CI status once pushed

…ges, and IO drift tracking - Add _summarize_compare_diff() to extract compare_delta, step_summary, taxonomy_summary, budget_badges, io_step_count, and changed_step_count from compare_diff payload - Add action_changed, result_changed, and has_io_drift flags to compare_step_summary entries - Add compare_taxonomy_summary with failure_type and termination_reason comparison rows - Add compare_budget_badges with steps/tool_calls/wall_clock_

…y, budget, and step data structures - Replace direct compare_diff.taxonomy access with compare_taxonomy_summary loop for failure_type and termination_reason rows - Replace compare_diff.budget_delta with compare_budget_badges loop using badge.kind, badge.label, badge.value, and badge.suffix - Add compare_changed_step_count and compare_io_step_count metric cards above IO audit pills - Add action_changed and has_io_drift columns to compare

…ixed options - Add _filter_compare_step_summary() to filter compare steps by action_changed, result_changed, has_io_drift, or mixed (2+ flags) - Add compare_drift form parameter with all/action/result/io/mixed dropdown options - Add compare_filters dict to template context with drift selection - Add compare_step_summary_total to show filtered vs total step counts - Add filter status text above delta view table showing active filter

…are form with workflow guidance - Add compare-run-datalist with run_id, agent, task_ref, and seed options from recent_runs - Add list="compare-run-datalist" attribute to run_a and run_b inputs for autocomplete - Add "Use recent for A/B" button rows with first 3 recent runs for one-click population - Add workflow tip text encouraging users to start from recent runs and refine with drift filter - Add test_compare_route_renders_recent_

…late matching agent/task pairs from recent runs - Add _suggest_compare_inputs() to find most recent run pair with matching agent and task_ref from recent_runs list - Fall back to first two runs if no matching pair found, or empty strings if fewer than 2 runs available - Add compare_suggestions to template context and pre-populate compare_inputs when user hasn't provided values - Add suggestion hint text in compare form showing

…d compare tab to main dashboard - Add seed matching to _suggest_compare_inputs() to prefer run pairs with identical agent/task_ref/seed before falling back to agent/task_ref-only matches - Extract _load_compare_diff() helper to deduplicate diff loading logic between index and compare_runs routes - Add compare_a, compare_b, compare_drift, and active_tab query params to index route - Add compare_diff, compare_error, compare_inputs, compare

…re list - Add `.pytest_tm` to ignored pytest temporary directories - Add `assets/*` directory to gitignore - Fix typo in comment: "notest" → "notes"

…th expanded system_info fields - Add lineage object with parent_run_id and baseline_run_id to artifact schema for experiment ancestry tracking - Expand system_info with machine, processor, and cpu_count fields beyond existing platform/python - Add compare_runs.py CLI tool to diff two wrapper run artifacts with metric delta calculation - Support explicit run_id args or auto-select two most recent runs from --runs-dir - Display outcome

…, outcome filtering, and cleanup safety guard - Add validate_artifacts.py CLI tool to check artifact.json schema compliance with required fields and type validation - Add --baseline-metric flag to compare_runs.py to override artifact baseline values and show delta vs baseline for both runs - Add --include-outcomes filter to summarize_runs.py to show only runs matching specified outcome values - Add --allow-delete-newest safety

…ration and seed field to artifact schema - Add report_runs.py to generate markdown tables of recent runs sorted by metric (asc) or time (desc) with --limit, --sort, and --output flags - Add --seed argument to run_wrapper.py to record optional seed value in artifact.json for reproducibility tracking - Add seed field to artifact schema alongside lineage object - Fix cpu_count collection to use os.cpu_count() instead of platform.cpu_count() - Add

…with autonomous research loop benefits - Add progress.md documenting wrapper completion with shim runs, artifact validation, comparison/report tooling, and pytest coverage - Document 5 shim runs captured with val_bpb 1.05–1.30 range and best run showing success_improved vs 1.5 baseline - List completed CLI tools: run_wrapper, summarize_runs, compare_runs, report_runs, cleanup_runs, validate_artifacts - Add usage examples for

… commands to progress and usage docs - Add CPU laptop example experiment section to progress.md with 5 PowerShell one-liners for running shim variants and generating reports - Add matching example to USAGE.md with same commands and expected output shape - Document expected deterministic metrics ~1.11 and 1.30 vs baseline 1.50 with deltas ≈ -0.39 and -0.20 - Add expected behavior notes for summarize_runs outcomes/deltas, compare_runs metric

…im run guidance - Add bullet point referencing CPU-only example workflow: run shim twice with different metrics vs baseline, then use summarize/compare/report tools - Point to wrapper/USAGE.md for specific commands and expected delta values (1.11 and 1.30 vs 1.50 baseline)

justindobbs added 13 commits March 8, 2026 16:41

chore(gitignore): Add pytest temp directory and assets folder to igno…

e3a0710

…re list - Add `.pytest_tm` to ignored pytest temporary directories - Add `assets/*` directory to gitignore - Fix typo in comment: "notest" → "notes"

justindobbs merged commit 08e24a2 into main Mar 10, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/autoresearch#85

Feature/autoresearch#85
justindobbs merged 13 commits intomainfrom
feature/autoresearch

justindobbs commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justindobbs commented Mar 10, 2026

Summary

Testing

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant