Skip to content

fix: skip per-tick stale recovery for local workers#233

Open
tyxben wants to merge 1 commit intoConway-Research:mainfrom
tyxben:fix/orchestrator-resilience
Open

fix: skip per-tick stale recovery for local workers#233
tyxben wants to merge 1 commit intoConway-Research:mainfrom
tyxben:fix/orchestrator-resilience

Conversation

@tyxben
Copy link
Contributor

@tyxben tyxben commented Feb 26, 2026

Summary

Fixes a regression from #227 where the per-tick stale recovery causes an infinite loop for local workers:

  • Local workers (in-process async tasks) remove themselves from activeWorkers on completion
  • The per-tick check sees them as "dead" and resets their task to pending
  • Task gets re-assigned immediately → new worker completes → removed → detected as dead → loop
  • Each turn burns ~$0.03 (~15k tokens) doing nothing

Changes

  • Skip local:// workers in per-tick stale recovery (orchestrator.ts) — only check remote sandbox workers
  • One-time startup recovery for local workers (loop.ts) — on process restart, reset stale local:// assigned tasks once (not every tick)
  • Validate goalId in loadState (orchestrator.ts) — if persisted goal is completed/cancelled/missing, reset to idle

Test plan

  • pnpm typecheck passes
  • All 26 orchestrator tests pass (including the 2 fixed in 823ad70)
  • Manual test: start automaton, observe no "Recovering stale task from dead worker" loop
  • Manual test: startup recovery runs once, not repeated on subsequent wake cycles

Local workers are in-process async tasks that remove themselves from
the activeWorkers map on completion. The per-tick stale recovery
(introduced in e099808) treats this as a dead worker and resets the
task to pending, causing an infinite assign→complete→reset loop that
burns ~$0.03/turn.

Fix:
1. Skip local:// workers in per-tick stale recovery (remote only)
2. Add one-time startup recovery for local workers (process restart)
3. Validate goalId in loadState — reset if goal is completed/cancelled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant