Set initial execution status to error if it was running #1554

tofarr · 2025-12-30T21:31:04Z

Summary

When a conversation is deserialized if the execution_status is running, we set the execution_status to error - because this means that the conversation stopped while executing some action - this most commonly means some sort of crash in a process started by the agent.

The Web Frontend does not pick this up yet, but the execution_status does appear as error, and the runtime_status appears as STATUS$ERROR

Testing

Start an agent server instance with

Checklist

If the PR is changing/adding functionality, are there tests to reflect this?
If there is an example, have you run the example to make sure that it works?
If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
Is the github CI passing?

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:bbc3277-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-bbc3277-python \
  ghcr.io/openhands/agent-server:bbc3277-python

All tags pushed for this build

ghcr.io/openhands/agent-server:bbc3277-golang-amd64
ghcr.io/openhands/agent-server:bbc3277-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:bbc3277-golang-arm64
ghcr.io/openhands/agent-server:bbc3277-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:bbc3277-java-amd64
ghcr.io/openhands/agent-server:bbc3277-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:bbc3277-java-arm64
ghcr.io/openhands/agent-server:bbc3277-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:bbc3277-python-amd64
ghcr.io/openhands/agent-server:bbc3277-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:bbc3277-python-arm64
ghcr.io/openhands/agent-server:bbc3277-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:bbc3277-golang
ghcr.io/openhands/agent-server:bbc3277-java
ghcr.io/openhands/agent-server:bbc3277-python

About Multi-Architecture Support

Each variant tag (e.g., bbc3277-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., bbc3277-python-amd64) are also available if needed

Test that a conversation with RUNNING execution_status becomes ERROR when resumed/restarted. This verifies the fix that prevents conversations from incorrectly remaining in RUNNING state after a crash or unexpected termination. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2025-12-30T23:11:24Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-agent-server/openhands/agent_server
conversation_service.py	336	209	37%	64, 67, 78–79, 82–85, 87, 91, 93, 96–103, 106–107, 110–114, 117–119, 121–124, 126, 133–134, 136–138, 141, 145, 147, 149, 156, 162, 170–171, 180–183, 192, 201, 206–207, 210, 223–224, 242, 245, 256–260, 262–265, 268–273, 276–279, 281–283, 286, 289–291, 296–299, 307, 312–314, 328–332, 335, 337, 340–342, 344, 348, 352, 359–363, 366–367, 371–375, 378–379, 383–387, 390–391, 397–402, 409–410, 414, 416–417, 422–423, 429–430, 436–438, 456, 480, 508, 510–511, 537, 539, 541–544, 549, 551–552, 556–557, 559–560, 563–565, 568, 574, 579–582, 589–590, 594–598, 600, 605, 609–611, 615–616, 618–620, 622, 624, 637–639, 642, 645, 648–651, 658–659, 663–665, 668–669, 671
event_service.py	314	158	49%	55–56, 75–77, 81–86, 89–92, 107, 123, 127, 131–132, 139, 141, 148–149, 157–160, 167–169, 186, 210–211, 214–215, 217–219, 221, 226, 229–230, 233–235, 238, 242–244, 246, 248, 259–262, 275–276, 279–280, 283, 286–288, 291–292, 295–296, 300, 303, 307, 311–312, 314, 331–332, 349, 351, 355–357, 361, 370–371, 373, 377, 383, 385, 393–398, 447, 449–452, 461, 477, 484, 488, 499–500, 510–513, 515–516, 520, 522, 526–529, 534–536, 538, 542–545, 549–552, 560–563, 582–583, 585–592, 594–595, 604–605, 607–608, 615–616, 618–619, 623, 629, 639–640, 647
openhands-sdk/openhands/sdk/llm
llm.py	420	157	62%	359, 364, 368, 372–373, 376, 380–381, 392–393, 395–396, 400, 417, 435–438, 485, 515–517, 538, 542, 557, 563–564, 588–589, 599, 624–629, 650–651, 654, 658, 670, 675–678, 687, 695–702, 706–709, 711, 724, 728–729, 731–732, 737–738, 740, 747, 750–755, 812–817, 874–875, 878–881, 923, 940, 994, 997, 1000–1008, 1012–1014, 1017, 1020–1022, 1029–1030, 1039, 1046–1048, 1052, 1054–1059, 1061–1078, 1081–1085, 1087–1088, 1094–1103, 1116, 1130, 1135
TOTAL	14512	6871	52%

hieptl

Thank you! 🙏

enyst

Are we sure "it cannot possibly be RUNNING"? I thought we have an autosave, though I could be wrong.

So I understand we may encounter a problem deserializing from RUNNING, but I'm not sure what the best solution is, could we maybe save it as something else, or avoid to save when running. Or set it as IDLE perhaps? The latter makes more sense to me, unless something prevents that.

openhands-ai · 2026-01-03T23:23:34Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Run tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1554 at branch `fix-restarted-conversation`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

enyst · 2026-01-04T00:19:03Z

openhands-agent-server/openhands/agent_server/event_service.py

+            unmatched_actions = ConversationState.get_unmatched_actions(state.events)
+            if unmatched_actions:
+                first_action = unmatched_actions[0]
+                error_event = AgentErrorEvent(


Out of curiosity, is there a reason why this isn't suitable?

software-agent-sdk/openhands-sdk/openhands/sdk/agent/agent.py

Line 139 in 7b623da

pending_actions = ConversationState.get_unmatched_actions(state.events)

Workflow is thus:

Agent executes some tool call which leaks memory

Agent server pod is evicted from K8 as a result of this.

Pod restarts - execution_status was running but is now error.

User can prompt the agent to run again - but it will run the last Action which does not have an observation - resulting in a repeat of step 1.

After change
4. The action which crashed the pod now has an AgetnErrorObservation, letting the agent know not to run the same action again (Unless prompted with something like "Please try that again!")

Got it, thank you. We had this kind of reset in agent controller back in V0. Indeed I think AgentErrorEvent is correct... 🤔

The other question is, is this really server-specific? If the state is RUNNING and it's auto-saved, which I think it is, is there anything preventing it from happening on some who-knows-what stuck process on LocalConversation?

openhands-sdk/openhands/sdk/llm/llm.py

tofarr and others added 2 commits December 30, 2025 14:26

Set initial execution status to error if it was running

ceaca09

tofarr marked this pull request as ready for review December 30, 2025 23:17

hieptl approved these changes Dec 31, 2025

View reviewed changes

enyst requested changes Dec 31, 2025

View reviewed changes

tofarr added 9 commits January 2, 2026 11:08

Merge branch 'main' into fix-restarted-conversation

df3b530

Moved execution status check to EventService

6dc80e3

Reverts

521ea72

Fixed broken links

353ee9b

Merge branch 'main' into fix-restarted-conversation

fffeb24

Merge branch 'main' into fix-restarted-conversation

b6657af

Merge branch 'main' into fix-restarted-conversation

b34c383

Simulating observation

4ce0ef9

Less strict LLM parsing

6613f6b

enyst reviewed Jan 4, 2026

View reviewed changes

Fix test

f106392

enyst reviewed Jan 4, 2026

View reviewed changes

openhands-sdk/openhands/sdk/llm/llm.py Show resolved Hide resolved

enyst approved these changes Jan 4, 2026

View reviewed changes

tofarr merged commit 8fb2354 into main Jan 4, 2026
21 checks passed

tofarr deleted the fix-restarted-conversation branch January 4, 2026 02:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set initial execution status to error if it was running #1554

Set initial execution status to error if it was running #1554

Uh oh!

tofarr commented Dec 30, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Dec 30, 2025 •

edited

Loading

Uh oh!

hieptl left a comment

Uh oh!

enyst left a comment

Uh oh!

openhands-ai bot commented Jan 3, 2026

Uh oh!

enyst Jan 4, 2026

Uh oh!

tofarr Jan 4, 2026

Uh oh!

enyst Jan 4, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Set initial execution status to error if it was running #1554

Set initial execution status to error if it was running #1554

Uh oh!

Conversation

tofarr commented Dec 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Checklist

Uh oh!

github-actions bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hieptl left a comment

Choose a reason for hiding this comment

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

openhands-ai bot commented Jan 3, 2026

Uh oh!

enyst Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

tofarr Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

enyst Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tofarr commented Dec 30, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Dec 30, 2025 •

edited

Loading