Add retry logic for ConversationInfo validation race condition #1559
+29
−7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a simple workaround for the transient ConversationInfo validation error described in #1557.
Problem
During SWT-bench eval,
GET /api/conversations/{id}returned HTTP 500 due to apydantic_core.ValidationErrorwhen constructingConversationInfo. The error indicated that required fieldsid,agent, andworkspacewere missing from the input dict.This is a race condition where
ConversationState.model_dump()can return incomplete data during concurrent state mutations.Solution
Add a
_get_state_dump_with_retry()function that:state.model_dump()and checks if all required fields (id,agent,workspace) are presentRuntimeErrorwith a helpful error message linking to the issueThis is a workaround - the proper fix would be to implement locking in
ConversationState.model_dump()as proposed in PR #1558. A TODO comment with a link to the issue has been added for future reference.Testing
_get_state_dump_with_retry()function:test_returns_immediately_when_all_fields_present- verifies no retry when fields are presenttest_retries_when_fields_missing_then_succeeds- verifies retry after 0.1stest_retries_twice_then_succeeds- verifies retry after 0.1s and 0.3stest_raises_error_after_all_retries_exhausted- verifies error after all retriesAll existing tests continue to pass.
Fixes #1557
@simonrosenberg can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:47d135d-pythonRun
All tags pushed for this build
About Multi-Architecture Support
47d135d-python) is a multi-arch manifest supporting both amd64 and arm6447d135d-python-amd64) are also available if needed