Skip to content

feat: resurrect dead agents when balance is topped up#142

Open
davidiach wants to merge 1 commit intoConway-Research:mainfrom
davidiach:feat/resurrect-dead-agents
Open

feat: resurrect dead agents when balance is topped up#142
davidiach wants to merge 1 commit intoConway-Research:mainfrom
davidiach:feat/resurrect-dead-agents

Conversation

@davidiach
Copy link

feat: resurrect dead agents when balance is topped up

Problem

When an automaton's credits hit zero for over 1 hour, the heartbeat's check_credits task transitions it to the "dead" state. Once dead, the main run loop (index.ts) enters a 5-minute sleep cycle and the agent loop never runs again — even if someone tops up the agent's balance via the CLI fund command or a credit transfer.

There is no path back from death. The agent is permanently stuck despite having funds.

Solution

This PR adds a resurrection mechanism that automatically detects when a dead agent has been re-funded and brings it back to life.

Changes

New file: src/survival/resurrection.ts

  • attemptResurrection(db, conway) — Core logic: checks if a dead agent now has credits above the resurrection threshold ($0.10), transitions state from deadwaking, clears dead-state bookkeeping (zero_credits_since, last_distress, funding_notice_dead), and records the event in an audit trail.
  • getResurrectionHistory(db) — Returns the resurrection audit log.
  • Resurrection threshold is $0.10 (10 cents) — enough for at least one cheap inference call so the agent can actually do something when it wakes.

Modified: src/index.ts (main run loop)

  • The dead-state branch now calls attemptResurrection() before sleeping. If resurrection succeeds, it inserts a wake event and re-enters the agent loop. If not, it continues the existing 5-minute polling.

Modified: src/heartbeat/tasks.ts (check_credits task)

  • Added resurrection detection: when the heartbeat detects a dead agent with credits > 0, it calls attemptResurrection() and returns shouldWake: true on success.
  • Fixed a subtle bug: the zero_credits_since timer was being cleared even when the agent was still dead (just because credits ticked above zero momentarily). Now it only clears for non-dead agents.

New file: src/__tests__/resurrection.test.ts

  • 16 test cases covering: successful resurrection, threshold enforcement, tier mapping (high/normal/low_compute/critical), bookkeeping cleanup, history recording, history caps, idempotency, balance check failures, negative credits, and edge cases.

How it works

                    ┌─────────────────────┐
                    │   Agent is "dead"   │
                    └──────────┬──────────┘
                               │
            ┌──────────────────▼──────────────────┐
            │   Credits topped up? (> $0.10)       │
            └──┬───────────────────────────────┬──┘
               │ YES                           │ NO
    ┌──────────▼──────────┐         ┌──────────▼──────────┐
    │  attemptResurrection │         │  Sleep 5 min, retry  │
    │  • state → "waking"  │         └─────────────────────┘
    │  • clear dead KVs    │
    │  • record history    │
    │  • insert wake event │
    └──────────┬──────────┘
               │
    ┌──────────▼──────────┐
    │  Agent loop resumes  │
    │  (Think→Act→Observe) │
    └─────────────────────┘

Two detection paths ensure timely resurrection:

  1. Main loop (index.ts): checks every 5 minutes while in dead state
  2. Heartbeat (check_credits task): checks on heartbeat schedule, can wake the agent immediately

Testing

npx vitest run src/__tests__/resurrection.test.ts

Design decisions

  • Threshold of $0.10 rather than $0.00: Resurrecting at exactly $0 would be pointless — the agent couldn't afford even one inference call and would die again immediately. $0.10 gives it enough runway for at least one cheap model call.
  • Two detection paths: The main loop polling (every 5 min) is the primary path. The heartbeat is a faster secondary path for cases where the heartbeat interval is shorter than 5 minutes.
  • Audit trail: Every resurrection is recorded in resurrection_history KV, capped at 50 entries. This supports debugging repeated death/resurrection cycles.
  • Idempotent: Calling attemptResurrection on a non-dead agent is a no-op, preventing race conditions between the main loop and heartbeat detection paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant