Skip to content

fix: handle delegation failures gracefully instead of crashing#567

Merged
FL4TLiN3 merged 3 commits intomainfrom
fix/delegation-error-handling
Feb 18, 2026
Merged

fix: handle delegation failures gracefully instead of crashing#567
FL4TLiN3 merged 3 commits intomainfrom
fix/delegation-error-handling

Conversation

@FL4TLiN3
Copy link
Contributor

Summary

  • Fix delegation crash: DelegationExecutor.executeSingleDelegation() now catches child run exceptions and non-completed statuses, returning error text to the parent coordinator instead of crashing the process
  • Fix root cause: createExpert callback auto-includes @perstack/base when the LLM omits it from skills, preventing SkillManager.fromExpert() from throwing "Base skill is not defined"
  • Fix timeout: Increase create-expert e2e timeout from 180s to 300s to handle slow LLM iteration loops (observed up to 226s)

Context

The create-expert e2e test was flaky in CI (#563). Root cause: when createExpert creates an expert without @perstack/base in skills, delegating to it crashes the entire process because the exception propagates uncaught through the delegation executor.

Reproduced locally (3 failures in 25 runs before fix, 0 in 10 after fix).

Test plan

  • 10/10 e2e runs pass locally after fix (0/25 failures vs 3/25 before)
  • All 1091 unit tests pass
  • 2 new unit tests for delegation error handling (stoppedByError, thrown exception)
  • Typecheck and lint clean

🤖 Generated with Claude Code

FL4TLiN3 and others added 3 commits February 18, 2026 03:43
When a child delegation run fails (e.g., missing @perstack/base skill,
MCP connection failure), the process crashed with exit code 1 because
DelegationExecutor.executeSingleDelegation() didn't handle exceptions
or non-completed checkpoint statuses from child runs.

- Wrap child runFn() in try/catch to handle thrown exceptions
- Check child checkpoint status and return error text to parent for
  non-completed statuses (stoppedByError, stoppedByExceededMaxSteps)
- Auto-include @perstack/base in createExpert when LLM omits it
- Increase create-expert e2e timeout from 180s to 300s

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@FL4TLiN3 FL4TLiN3 enabled auto-merge (squash) February 18, 2026 03:47
@FL4TLiN3 FL4TLiN3 merged commit a7135b4 into main Feb 18, 2026
11 checks passed
@FL4TLiN3 FL4TLiN3 deleted the fix/delegation-error-handling branch February 25, 2026 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant