Skip to content

feat(subagent): comprehensive sub-agent system with persistence#34

Open
lalyeah wants to merge 1 commit intoXSpoonAi:masterfrom
lalyeah:feat/subagent-enhancement
Open

feat(subagent): comprehensive sub-agent system with persistence#34
lalyeah wants to merge 1 commit intoXSpoonAi:masterfrom
lalyeah:feat/subagent-enhancement

Conversation

@lalyeah
Copy link
Collaborator

@lalyeah lalyeah commented Mar 13, 2026

Summary

Implement a full sub-agent system that allows agents to spawn concurrent child agents with automatic result delivery, lifecycle management, and crash-safe persistence.

Core System (spoon_bot/subagent/)

  • SubagentManager: Orchestration engine — spawn, cancel (cascade), steer, info, status
  • SubagentRegistry: Thread-safe lifecycle tracking with state machine and JSON persistence
  • SubagentTool: LLM-facing tool with spawn/status/cancel/kill/steer/info actions
  • Models: SubagentConfig, SubagentRecord, SubagentResult, TokenUsage, SubagentState

Key Features

  • Push-based wake continuation: Sub-agent results auto-injected into parent session via MessageBus
  • Cascade kill: Cancelling a sub-agent recursively cancels all descendants (deepest-first)
  • Steer: Redirect a running sub-agent to a new task mid-execution with rate limiting
  • Pending descendants tracking: Prevents false "done" signals when children still running
  • Token usage tracking per sub-agent run
  • Extended thinking support (thinking_level config)
  • Per-task run timeout via asyncio.wait_for
  • Announce retry with exponential backoff (3 retries)
  • Lifecycle events (SubagentEvent) emitted on the message bus
  • Max spawn depth raised to 8

Persistence (persistence.py)

  • Single JSON file: {workspace}/subagents/runs.json (schema version 1)
  • Atomic writes (write-to-.tmp + replace()) for crash safety
  • Startup restoration: Orphaned PENDING/RUNNING records → FAILED with diagnostic message
  • Background sweeper: Archives terminal records after configurable interval (default 60min)
  • Corrupt file quarantine: Renamed to .corrupt.json, graceful fallback to empty registry

Channel Integrations

  • Telegram: /subagents command — list/spawn/cancel/kill/steer/info/help
    • spawn supports --model and --thinking flags
    • Enhanced list with active/recent separation, model names, pending descendants
  • Discord: Lifecycle event listener logging spawn/complete/fail/cancel events

Configuration

New fields in SubagentLimitsConfig:

  • persist_runs (default: true), persist_file, archive_after_minutes, sweeper_interval_seconds
  • thinking_level, timeout_seconds in SubagentConfig

Test Plan

  • All new models validate correctly (SubagentConfig, TokenUsage, SubagentRecord new fields)
  • Registry count_pending_descendants() BFS works with parent-child hierarchies
  • SubagentRunsFile save/load roundtrip preserves all fields
  • Corrupt file recovery (quarantine to .corrupt.json, return empty dict)
  • restore_subagent_runs() marks orphaned PENDING/RUNNING as FAILED
  • Config max_depth=8 accepted
  • Full test suite: 515 passed, 20 failed (all pre-existing), no regressions
  • Manual: Telegram /subagents spawn/list/cancel/steer/info commands
  • Manual: Restart gateway, verify runs.json restored and orphans reconciled

…ence

Implement a full sub-agent system allowing agents to spawn child agents
that run concurrently, with push-based result delivery, lifecycle events,
and file-based persistence across process restarts.

Core sub-agent system (spoon_bot/subagent/):
- SubagentManager: orchestration engine with spawn/cancel/steer/info
- SubagentRegistry: thread-safe lifecycle tracking with state machine
- SubagentTool: LLM-facing tool with spawn/status/cancel/kill/steer/info actions
- Models: SubagentConfig, SubagentRecord, SubagentResult, TokenUsage, SubagentState

Key features:
- Push-based wake continuation: results auto-injected into parent session via MessageBus
- Cascade kill: cancelling a sub-agent recursively cancels all descendants
- Steer operation: redirect a running sub-agent to a new task mid-execution
- Pending descendants tracking: prevents false completion signals
- Per-sub-agent token usage tracking
- Extended thinking support (thinking_level config)
- Per-task run timeout (asyncio.wait_for)
- Announce retry with exponential backoff (3 retries, 1s/2s/4s)
- Lifecycle events (SubagentEvent) on the message bus
- Max spawn depth raised to 8

Persistence (spoon_bot/subagent/persistence.py):
- JSON file persistence ({workspace}/subagents/runs.json)
- Atomic writes (write-to-tmp + replace) for crash safety
- Startup restoration with orphan reconciliation (PENDING/RUNNING -> FAILED)
- Background SubagentSweeper archives terminal records after configurable interval
- Schema versioning (v1) with forward compatibility and migration hooks
- Corrupt file quarantine (.corrupt.json) with graceful fallback

Channel integrations:
- Telegram: /subagents command with list/spawn/cancel/kill/steer/info/help
  - spawn supports --model and --thinking flags
  - Enhanced list with active/recent separation, model names, pending descendants
- Discord: lifecycle event listener logging spawn/complete/fail/cancel events

Configuration:
- SubagentLimitsConfig: persist_runs, persist_file, archive_after_minutes,
  sweeper_interval_seconds, max_depth (le=8), thinking_level, timeout_seconds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant