[codex] Harden repair reruns with ACP recovery and Stage 10 safety gates by CKwin26 · Pull Request #1 · CKwin26/AutoResearchClaw

CKwin26 · 2026-03-31T05:34:41Z

Summary

harden repair reruns with ACP retry/debug instrumentation and prompt sanitation
let experiment sandboxes pass optional entrypoint args and environment overrides so repaired runs can inject runtime asset paths cleanly
strengthen Stage 10 code-generation validation so repaired experiments must be self-contained, avoid placeholder/demo code, and include ablation distinctness checks
add repair-oriented config knobs, helper plumbing, and regression coverage for the new rerun paths

Why

The repair workflow can now recover and iterate much deeper than the original baseline, but it exposed a few recurring failure modes:

ACP prompt sessions would die without enough diagnostics or retry behavior
generated experiments could require runtime asset paths that the execution layer had no clean way to pass through
Stage 10 could produce code that looked plausible but was not actually safe to rerun: missing local helper modules, placeholder/demo logic, or ablation variants that were not wired distinctly

This PR turns those lessons into concrete repair infrastructure so reruns fail earlier, with better diagnostics, and with a stronger contract for generated experiments.

What Changed

ACP repair hardening

add reconnect retry/backoff controls and richer ACP debug logging config
capture status on prompt failure, archive failed prompt payloads, and preserve more actionable failure context
sanitize problematic prompt text before dispatch and improve timeout / retry reporting in repair runs

Execution and runtime asset plumbing

allow sandbox backends to forward optional entrypoint args and environment overrides
preserve forwarded args through the Docker entrypoint wrapper
extend execution helpers so repaired experiment entrypoints can receive runtime asset parameters instead of hard-coding paths

Stage 10 safety gates for repaired experiments

add deeper validation for missing local helper modules and repair them before execution
reject obvious placeholder/demo implementations and other low-fidelity experiment stubs
require ablation/condition distinctness checks so repaired multi-condition experiments have an explicit self-check path instead of silently shipping degenerate variants

Config and tests

expose the new repair/ACP knobs in config and example config
add regression coverage around repair-oriented CLI/runner flows, code-generation checks, and sandbox arg/env forwarding

Validation

python3 -m compileall researchclaw/config.py researchclaw/llm/acp_client.py researchclaw/experiment/sandbox.py researchclaw/experiment/docker_sandbox.py researchclaw/experiment/ssh_sandbox.py researchclaw/experiment/colab_sandbox.py researchclaw/pipeline/_helpers.py researchclaw/pipeline/runner.py researchclaw/pipeline/stage_impls/_code_generation.py researchclaw/pipeline/stage_impls/_execution.py tests/test_rc_cli.py tests/test_rc_executor.py tests/test_rc_runner.py tests/test_ssh_and_colab_sandbox.py
minimal smoke: import and exercise the Stage 10 condition-distinctness checker with a passing self-check example
minimal smoke: run ExperimentSandbox.run_project(...) with forwarded args/env overrides and confirm the child entrypoint receives them

Notes

This PR intentionally focuses on the repair / rerun pathway and its safety rails.
It does not include the separate detector-unification work or the benchmark-agent exploration changes in the local worktree.

harden repair reruns

58911a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Harden repair reruns with ACP recovery and Stage 10 safety gates#1

[codex] Harden repair reruns with ACP recovery and Stage 10 safety gates#1
CKwin26 wants to merge 1 commit intomainfrom
codex/repair-hardening

CKwin26 commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CKwin26 commented Mar 31, 2026

Summary

Why

What Changed

ACP repair hardening

Execution and runtime asset plumbing

Stage 10 safety gates for repaired experiments

Config and tests

Validation

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant