Skip to content

[codex] Harden repair reruns with ACP recovery and Stage 10 safety gates#1

Draft
CKwin26 wants to merge 1 commit intomainfrom
codex/repair-hardening
Draft

[codex] Harden repair reruns with ACP recovery and Stage 10 safety gates#1
CKwin26 wants to merge 1 commit intomainfrom
codex/repair-hardening

Conversation

@CKwin26
Copy link
Copy Markdown
Owner

@CKwin26 CKwin26 commented Mar 31, 2026

Summary

  • harden repair reruns with ACP retry/debug instrumentation and prompt sanitation
  • let experiment sandboxes pass optional entrypoint args and environment overrides so repaired runs can inject runtime asset paths cleanly
  • strengthen Stage 10 code-generation validation so repaired experiments must be self-contained, avoid placeholder/demo code, and include ablation distinctness checks
  • add repair-oriented config knobs, helper plumbing, and regression coverage for the new rerun paths

Why

The repair workflow can now recover and iterate much deeper than the original baseline, but it exposed a few recurring failure modes:

  • ACP prompt sessions would die without enough diagnostics or retry behavior
  • generated experiments could require runtime asset paths that the execution layer had no clean way to pass through
  • Stage 10 could produce code that looked plausible but was not actually safe to rerun: missing local helper modules, placeholder/demo logic, or ablation variants that were not wired distinctly

This PR turns those lessons into concrete repair infrastructure so reruns fail earlier, with better diagnostics, and with a stronger contract for generated experiments.

What Changed

ACP repair hardening

  • add reconnect retry/backoff controls and richer ACP debug logging config
  • capture status on prompt failure, archive failed prompt payloads, and preserve more actionable failure context
  • sanitize problematic prompt text before dispatch and improve timeout / retry reporting in repair runs

Execution and runtime asset plumbing

  • allow sandbox backends to forward optional entrypoint args and environment overrides
  • preserve forwarded args through the Docker entrypoint wrapper
  • extend execution helpers so repaired experiment entrypoints can receive runtime asset parameters instead of hard-coding paths

Stage 10 safety gates for repaired experiments

  • add deeper validation for missing local helper modules and repair them before execution
  • reject obvious placeholder/demo implementations and other low-fidelity experiment stubs
  • require ablation/condition distinctness checks so repaired multi-condition experiments have an explicit self-check path instead of silently shipping degenerate variants

Config and tests

  • expose the new repair/ACP knobs in config and example config
  • add regression coverage around repair-oriented CLI/runner flows, code-generation checks, and sandbox arg/env forwarding

Validation

  • python3 -m compileall researchclaw/config.py researchclaw/llm/acp_client.py researchclaw/experiment/sandbox.py researchclaw/experiment/docker_sandbox.py researchclaw/experiment/ssh_sandbox.py researchclaw/experiment/colab_sandbox.py researchclaw/pipeline/_helpers.py researchclaw/pipeline/runner.py researchclaw/pipeline/stage_impls/_code_generation.py researchclaw/pipeline/stage_impls/_execution.py tests/test_rc_cli.py tests/test_rc_executor.py tests/test_rc_runner.py tests/test_ssh_and_colab_sandbox.py
  • minimal smoke: import and exercise the Stage 10 condition-distinctness checker with a passing self-check example
  • minimal smoke: run ExperimentSandbox.run_project(...) with forwarded args/env overrides and confirm the child entrypoint receives them

Notes

  • This PR intentionally focuses on the repair / rerun pathway and its safety rails.
  • It does not include the separate detector-unification work or the benchmark-agent exploration changes in the local worktree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant