Skip to content

Conversation

@ak684
Copy link
Contributor

@ak684 ak684 commented Dec 23, 2025

Summary

This PR adds a recoverable boolean field to AgentErrorEvent to distinguish between errors that need user attention and errors that the LLM can self-correct.

Motivation

Currently, all AgentErrorEvents are treated identically by the frontend, resulting in a big red error banner even for recoverable errors. This creates a poor UX when:

  1. LLM makes a tool call without security_risk field
  2. SDK's agent.py raises a ValueError
  3. SDK creates AgentErrorEvent with the error message
  4. Event is sent via WebSocket to frontend
  5. Frontend's conversation-websocket-context.tsx receives it
  6. setErrorMessage(event.error) stores it in error-message-store
  7. chat-interface.tsx renders ErrorMessageBanner (big red banner!)
  8. LLM receives error, self-corrects, continues working fine
  9. But user still sees a scary red banner

The problem is that ALL AgentErrorEvents are treated as critical errors. But validation errors (like missing security_risk) are:

  • Recoverable - the LLM can retry with corrected parameters
  • Internal/technical - not user-actionable
  • Not actually blocking the conversation

Changes

  1. AgentErrorEvent now has a recoverable: bool field (default: False)
  2. Validation errors (missing/invalid arguments, non-existent tools) are marked as recoverable=True
  3. Tests updated to verify the recoverable field is set correctly

Related PR

A companion PR for the OpenHands frontend will use this field to avoid showing the big red error banner for recoverable errors.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:f31273d-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-f31273d-python \
  ghcr.io/openhands/agent-server:f31273d-python

All tags pushed for this build

ghcr.io/openhands/agent-server:f31273d-golang-amd64
ghcr.io/openhands/agent-server:f31273d-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:f31273d-golang-arm64
ghcr.io/openhands/agent-server:f31273d-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:f31273d-java-amd64
ghcr.io/openhands/agent-server:f31273d-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:f31273d-java-arm64
ghcr.io/openhands/agent-server:f31273d-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:f31273d-python-amd64
ghcr.io/openhands/agent-server:f31273d-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:f31273d-python-arm64
ghcr.io/openhands/agent-server:f31273d-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:f31273d-golang
ghcr.io/openhands/agent-server:f31273d-java
ghcr.io/openhands/agent-server:f31273d-python

About Multi-Architecture Support

  • Each variant tag (e.g., f31273d-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., f31273d-python-amd64) are also available if needed

Add a 'recoverable' boolean field to AgentErrorEvent to indicate whether
the LLM can self-correct from the error. Validation errors (missing/invalid
arguments, non-existent tools) are marked as recoverable=True since the
LLM can retry with corrected parameters.

This allows frontends to differentiate between critical errors that need
user attention and recoverable errors that the LLM will handle automatically.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/agent
   agent.py1785867%85, 89, 140, 144–145, 154–155, 171–173, 180–182, 184, 188, 191–192, 194–195, 213, 240, 245, 256, 295, 300, 311, 314, 337, 347–348, 369–371, 373, 385–386, 392–393, 413–414, 419, 431–432, 438–439, 471, 478–479, 507, 514, 518–519, 557–559, 562–563, 567
openhands-sdk/openhands/sdk/event/llm_convertible
   observation.py631871%79–84, 87, 96–97, 102, 122–125, 130, 139–140, 145
TOTAL13455612454% 

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting problem! I think the way we do this, is indeed make a distinction between

  • AgentErrorEvent = recoverable, are sent to the LLM next step
  • ConversationErrorEvent = from exceptions unrelated to the agent, unrecoverable typically, sent to the client application.

I'd love to know, what case made this PR necessary and why wasn't it one of those?

@all-hands-bot
Copy link
Collaborator

[Automatic Post]: It has been a while since there was any activity on this PR. @ak684, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants