-
Notifications
You must be signed in to change notification settings - Fork 167
Open
Labels
bugReproducible defect or broken behaviorReproducible defect or broken behavior
Description
Summary
If a resume attempt fails due to a tooling/runtime integration error, Ouroboros records the orchestrator session as failed and then refuses to resume it ever again. A transient resume bug therefore permanently burns the session.
Impact
This turns a recoverable runtime failure into irreversible session loss. Users hit one bad resume attempt and then cannot continue the run even after the underlying bug is fixed.
Environment
- Ouroboros version:
0.26.0b7+sam.rawqa1 - OS: macOS 15.7.3
- Python version: 3.14.1
- Installation method: local wheel/dev install
- Model/provider: Codex CLI
- Codex CLI version:
0.116.0 - Relevant flags/config:
ouroboros run workflow <seed> --debug --resume <session_id>
Steps to reproduce
- Start a workflow run and obtain a session id.
- Trigger a resume failure caused by an external/runtime integration issue rather than a true completed run. One example is the bad Codex resume command ordering bug.
- Retry
ouroboros run workflow <seed_file> --debug --resume <session_id>after the underlying runtime problem has been fixed.
Expected behavior
A session that failed while trying to resume because of a recoverable runtime/tooling problem should either:
- remain resumable, or
- provide an explicit recovery path for retrying the resume.
At minimum, a transient resume integration bug should not permanently force a fresh run.
Actual behavior
The failed resume attempt writes a terminal failed state, and later resume attempts are rejected with:
Orchestrator error: Session is in terminal state failed, cannot resume
Logs or error output
2026-03-26T16:21:18.279754Z [error ] orchestrator.session.failed error="error: unexpected argument '-C' found ..."
2026-03-26T16:24:04.712903Z [info ] orchestrator.session.reconstructed messages_processed=1138 session_id=orch_fda915674180 status=failed
Orchestrator error: Session is in terminal state failed, cannot resume (details: {'session_id': 'orch_fda915674180', 'status': 'failed'})
Minimal reproduction
- Run any workflow.
- Cause
resume_session()to fail before useful forward progress because of a transient runtime integration error. - Retry resume.
Acceptance criteria for fix
- A failed resume attempt caused by recoverable runtime/tooling issues does not permanently make the session unresumable.
- Users have a supported retry path after fixing the underlying resume/runtime problem.
- Terminal-state enforcement distinguishes true terminal completion/failure from retryable resume/transport/runtime failures.
- Tests cover the retry behavior for a failed resume path.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugReproducible defect or broken behaviorReproducible defect or broken behavior