Skip to content

Conversation

@simonrosenberg
Copy link
Collaborator

@simonrosenberg simonrosenberg commented Jan 2, 2026

Fixes #1570

Summary

  • Stop _wait_for_run_completion from swallowing terminal errors; surface ERROR/STUCK/4xx as ConversationRunError while retrying transient failures.
  • Refactor polling logic into focused helpers for clarity.
  • Simplify polling exception flow by separating poll vs status handling.
  • Rename _handle_terminal_status to _handle_conversation_status and move completion logging to the caller.
  • Add tests for ERROR, STUCK, and 404 polling behavior.

Testing

Ran CI workflow with this branch: no more polling issues

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works? (N/A)
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works? (some failures)
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name? (N/A)
  • Is the github CI passing? (not run yet)

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d73dfb9-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d73dfb9-python \
  ghcr.io/openhands/agent-server:d73dfb9-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d73dfb9-golang-amd64
ghcr.io/openhands/agent-server:d73dfb9-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d73dfb9-golang-arm64
ghcr.io/openhands/agent-server:d73dfb9-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d73dfb9-java-amd64
ghcr.io/openhands/agent-server:d73dfb9-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d73dfb9-java-arm64
ghcr.io/openhands/agent-server:d73dfb9-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d73dfb9-python-amd64
ghcr.io/openhands/agent-server:d73dfb9-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:d73dfb9-python-arm64
ghcr.io/openhands/agent-server:d73dfb9-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:d73dfb9-golang
ghcr.io/openhands/agent-server:d73dfb9-java
ghcr.io/openhands/agent-server:d73dfb9-python

About Multi-Architecture Support

  • Each variant tag (e.g., d73dfb9-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., d73dfb9-python-amd64) are also available if needed

@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/conversation/impl
   remote_conversation.py48717164%69–75, 82–85, 114, 121, 129, 131–134, 144, 153, 157–158, 163–166, 201, 215, 232, 243, 252–253, 305, 325, 333, 345, 353–356, 359, 364–367, 369, 374–375, 380–384, 389–393, 398–401, 404, 415–416, 420, 424, 427, 514–515, 519–520, 528, 534, 536, 552–553, 558, 560–561, 572, 589–590, 594, 600–601, 605, 610–611, 616–618, 621–625, 627–628, 632, 634–642, 644, 648, 663, 681, 716, 718, 721, 749, 759–760, 788–789, 794, 802–806, 813–814, 818, 823–827, 831–839, 842–843, 852–853, 862, 870, 875–877, 879, 882, 884–885, 905, 907, 913–914, 929, 936, 942–943, 958, 971, 977–978, 985–986
TOTAL14467683652% 

@simonrosenberg simonrosenberg requested a review from enyst January 2, 2026 13:49
@simonrosenberg simonrosenberg self-assigned this Jan 2, 2026
@simonrosenberg simonrosenberg added the integration-test Runs the integration tests and comments the results label Jan 2, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2026

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2026

🧪 Integration Tests Results

Overall Success Rate: 0.0%
Total Cost: $0.00
Models Tested: 6
Timestamp: 2026-01-02 15:54:46 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Integration (Required) Behavior (Optional) Tests Passed Skipped Total Cost Tokens
litellm_proxy_vertex_ai_gemini_3_pro_preview 0.0% 0.0% N/A 0/9 0 9 $0.00 0
litellm_proxy_claude_sonnet_4_5_20250929 0.0% 0.0% N/A 0/9 0 9 $0.00 0
litellm_proxy_mistral_devstral_2512 0.0% 0.0% N/A 0/8 1 9 $0.00 0
litellm_proxy_gpt_5.1_codex_max 0.0% 0.0% N/A 0/8 1 9 $0.00 0
litellm_proxy_deepseek_deepseek_chat 0.0% 0.0% N/A 0/8 1 9 $0.00 0
litellm_proxy_moonshot_kimi_k2_thinking 0.0% 0.0% N/A 0/8 1 9 $0.00 0

📋 Detailed Results

litellm_proxy_vertex_ai_gemini_3_pro_preview

  • Overall Success Rate: 0.0% (0/9)
  • Integration Tests (Required): 0.0% (0/9)
  • Total Cost: $0.00
  • Token Usage: 0
  • Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_e83fe45_gemini_3_pro_run_N9_20260102_155333

Failed Tests:

  • t01_fix_simple_typo ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=86fc7f95-efc3-46cc-a913-5c2022f6a506: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t04_git_staging ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=1b981ea2-de83-45ad-9a23-93a82f4a9cec: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t03_jupyter_write_file ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=dc5a6d1b-796a-4f92-a620-7d0c8fc68d88: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t02_add_bash_hello ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=b1871988-fad4-40a9-90b8-cd22d859761c: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t07_interactive_commands ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=774ae584-3248-4e27-93cd-368050b4c63a: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t08_image_file_viewing ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=b4a8d7d5-ebd8-4eb4-8279-adf4132d560c: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t06_github_pr_browsing ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=918dee6b-bc2b-4f94-ad25-e1ac96a42b87: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t09_token_condenser ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=fe58c189-0cc3-40c3-bcf4-eae67f2da5e9: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t05_simple_browsing ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=f175e3c1-8363-4f37-a91e-cfcd7a2addee: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)

litellm_proxy_claude_sonnet_4_5_20250929

  • Overall Success Rate: 0.0% (0/9)
  • Integration Tests (Required): 0.0% (0/9)
  • Total Cost: $0.00
  • Token Usage: 0
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_e83fe45_sonnet_run_N9_20260102_155334

Failed Tests:

  • t01_fix_simple_typo ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=d8e01652-eccc-437b-8564-c404e7bb6f45: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t07_interactive_commands ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=ba1e596b-6419-47f0-9302-c6553bbf2442: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t06_github_pr_browsing ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=c3eab436-dbd4-4091-9057-92697664901e: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t03_jupyter_write_file ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=36bd83fe-af21-4873-ba44-7aca77fc7c3c: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t02_add_bash_hello ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=af1fbc51-cb1a-480d-bf0d-90aee7563f97: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t04_git_staging ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=184f8ce2-c856-4a1f-afb3-b43baf50b2bc: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t09_token_condenser ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=2de3cb00-c627-42fe-9a22-c818db9d5f2b: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t08_image_file_viewing ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=cbbee42c-f7e6-4925-802c-eb91401ccb4e: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t05_simple_browsing ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=0d6caa08-ad9f-4168-ba3d-ef2e60b43165: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)

litellm_proxy_mistral_devstral_2512

  • Overall Success Rate: 0.0% (0/8)
  • Integration Tests (Required): 0.0% (0/9)
  • Total Cost: $0.00
  • Token Usage: 0
  • Run Suffix: litellm_proxy_mistral_devstral_2512_e83fe45_devstral_2512_run_N9_20260102_155333
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t03_jupyter_write_file ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=aa1045c7-7db7-478d-b315-41ca77ea86e5: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t01_fix_simple_typo ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=5d545716-878c-4931-9d4f-e4a8ad738474: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t06_github_pr_browsing ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=ef7fa067-ee26-4503-980e-9ac196c07018: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t04_git_staging ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=cf6c1400-1481-4217-a698-bd91a3772c81: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t02_add_bash_hello ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=4588d027-59fa-45bf-9629-d72afd853f01: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t07_interactive_commands ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=5fc6ad82-8ef5-451e-89c3-4db1e458fd10: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t09_token_condenser ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=7c0479fe-c013-47ea-8fc5-6b1a0eef2bba: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t05_simple_browsing ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=33217eb2-54b5-4a8e-bbff-6ddee6fc2489: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)

litellm_proxy_gpt_5.1_codex_max

  • Overall Success Rate: 0.0% (0/8)
  • Integration Tests (Required): 0.0% (0/9)
  • Total Cost: $0.00
  • Token Usage: 0
  • Run Suffix: litellm_proxy_gpt_5.1_codex_max_e83fe45_gpt51_codex_run_N9_20260102_155332
  • Skipped Tests: 1

Skipped Tests:

  • t09_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

Failed Tests:

  • t02_add_bash_hello ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=a027c7f8-ae14-4113-ad70-5f43eb585c4d: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - {"error":{"message":"Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable","type":"token_not_found_in_db","param":"key","code":"401"}} (Cost: $0.00)
  • t03_jupyter_write_file ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=9f951360-ec4e-4dc7-a37e-199688bd757f: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - {"error":{"message":"Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable","type":"token_not_found_in_db","param":"key","code":"401"}} (Cost: $0.00)
  • t08_image_file_viewing ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=3d11da44-5bd7-4fb0-adc0-5fffec212fe8: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - {"error":{"message":"Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable","type":"token_not_found_in_db","param":"key","code":"401"}} (Cost: $0.00)
  • t04_git_staging ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=b1f79408-cf8c-4a8a-b59b-1ed39ae7302f: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - {"error":{"message":"Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable","type":"token_not_found_in_db","param":"key","code":"401"}} (Cost: $0.00)
  • t06_github_pr_browsing ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=00bfd020-e925-404a-808d-20979a4e508b: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - {"error":{"message":"Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable","type":"token_not_found_in_db","param":"key","code":"401"}} (Cost: $0.00)
  • t01_fix_simple_typo ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=b7f7e1ff-ca89-458e-85cd-c5d9ba0d1553: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - {"error":{"message":"Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable","type":"token_not_found_in_db","param":"key","code":"401"}} (Cost: $0.00)
  • t07_interactive_commands ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=4971a33a-2f9b-47e2-aa9d-45810f02657c: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - {"error":{"message":"Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable","type":"token_not_found_in_db","param":"key","code":"401"}} (Cost: $0.00)
  • t05_simple_browsing ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=ee649279-10f6-4c73-a6ca-ff03ed0f305f: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - {"error":{"message":"Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable","type":"token_not_found_in_db","param":"key","code":"401"}} (Cost: $0.00)

litellm_proxy_deepseek_deepseek_chat

  • Overall Success Rate: 0.0% (0/8)
  • Integration Tests (Required): 0.0% (0/9)
  • Total Cost: $0.00
  • Token Usage: 0
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_e83fe45_deepseek_run_N9_20260102_155333
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t09_token_condenser ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=b92decbc-4232-454d-9842-3dc5e4de9919: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t07_interactive_commands ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=cf3efbfc-386c-4583-8af9-33519ad5dafd: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t02_add_bash_hello ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=d37967e2-2558-4bef-8616-bfd84ec82096: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t06_github_pr_browsing ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=0805e889-f3b4-4ae1-91b4-91cb62af755b: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t01_fix_simple_typo ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=dc31420a-ba48-4890-9105-d04d36d9ff43: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t03_jupyter_write_file ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=5a402c06-741e-485d-8f90-758bb2e214c5: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t04_git_staging ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=41294414-eee9-43ac-996e-a0d30aef23cc: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t05_simple_browsing ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=f4819c26-0ca6-49c1-99e7-c6299026a6a1: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)

litellm_proxy_moonshot_kimi_k2_thinking

  • Overall Success Rate: 0.0% (0/8)
  • Integration Tests (Required): 0.0% (0/9)
  • Total Cost: $0.00
  • Token Usage: 0
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_e83fe45_kimi_k2_run_N9_20260102_155357
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t09_token_condenser ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=16c2f2c0-8d56-42a2-af42-ce83165ba090: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t06_github_pr_browsing ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=a2acfcc0-4424-42a4-9072-b6e0e0006460: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t03_jupyter_write_file ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=e5170a82-3ac4-470f-8b6a-c7fa2bbf5aa8: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t02_add_bash_hello ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=39fbe1c5-ad95-4dcf-8623-ddb8d0568403: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t01_fix_simple_typo ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=86cff199-266e-43f2-9b21-0a94e5345334: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t04_git_staging ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=16dfdd49-8a21-4c21-8278-b72decb1cf2b: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t07_interactive_commands ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=eea61d6c-a4a2-4041-ad61-e6d94dfd6433: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t05_simple_browsing ⚠️ REQUIRED: Test execution failed: Conversation run failed for id=250cfb46-ae06-4e16-91e1-303113402bae: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)

@openhands-ai
Copy link

openhands-ai bot commented Jan 2, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Agent Server

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1572 at branch `fix/remote-conversation-terminal-errors`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@simonrosenberg simonrosenberg removed the request for review from enyst January 2, 2026 17:20
@simonrosenberg
Copy link
Collaborator Author

@OpenHands read this issue #1570
and read the PR diff.
Does this PR solve the problem?
Also take a close look at local_connersation.py and remote_conversation.py. Find any logical flaw or problems in the code

@openhands-ai
Copy link

openhands-ai bot commented Jan 2, 2026

I'm on it! simonrosenberg can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Jan 2, 2026

Analysis Summary

Does this PR solve the problem described in Issue #1570?

Yes, the PR correctly solves the issue.

The issue described that RemoteConversation._wait_for_run_completion() was swallowing terminal errors (ConversationRunError + 404s) and polling forever because of a broad except Exception block that caught and logged all exceptions, then continued polling.

The PR fixes this by:

  1. Refactoring the polling logic into three focused helper methods:

    • _poll_status_once() - Fetches the current execution status
    • _handle_conversation_status(status) - Handles non-running statuses, returns True if complete
    • _handle_poll_exception(exc) - Classifies exceptions into retryable vs terminal
  2. Properly classifying exceptions in _handle_poll_exception:

    • Terminal (raises ConversationRunError): 404, 4xx (except 429), and unexpected exceptions
    • Retryable (logs and continues): 5xx, 429 (rate limiting), and network errors (httpx.RequestError)
  3. Properly handling ERROR/STUCK statuses in _handle_conversation_status - these now raise ConversationRunError and are called in the else block (not inside the try), so they won't be caught by the exception handler.

Logical Flaws Analysis

No significant logical flaws found in either local_conversation.py or remote_conversation.py.

The code structure is sound:

  • If _handle_poll_exception returns (for retryable errors), the loop continues with time.sleep(poll_interval)
  • If _handle_poll_exception raises (for terminal errors), the exception propagates up correctly
  • If _handle_conversation_status returns True (completed), the method returns
  • If _handle_conversation_status returns False (RUNNING), polling continues
  • If _handle_conversation_status raises (ERROR/STUCK), the exception propagates correctly

The local_conversation.py handles errors differently since it runs locally - exceptions are caught, status is set to ERROR, and ConversationRunError is raised. This is appropriate for local execution.

Tests

The PR adds three new tests that all pass:

  • test_remote_conversation_run_error_status_raises - Verifies ERROR status raises ConversationRunError
  • test_remote_conversation_run_stuck_status_raises - Verifies STUCK status raises ConversationRunError
  • test_remote_conversation_run_404_raises - Verifies 404 during polling raises ConversationRunError

All three tests pass successfully.

View full conversation

@xingyaoww xingyaoww removed the integration-test Runs the integration tests and comments the results label Jan 2, 2026
Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@xingyaoww xingyaoww added the integration-test Runs the integration tests and comments the results label Jan 2, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2026

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2026

🧪 Integration Tests Results

Overall Success Rate: 98.0%
Total Cost: $1.95
Models Tested: 6
Timestamp: 2026-01-02 22:28:21 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Integration (Required) Behavior (Optional) Tests Passed Skipped Total Cost Tokens
litellm_proxy_mistral_devstral_2512 87.5% 87.5% N/A 7/8 1 9 $0.19 464,037
litellm_proxy_deepseek_deepseek_chat 100.0% 100.0% N/A 8/8 1 9 $0.05 495,277
litellm_proxy_vertex_ai_gemini_3_pro_preview 100.0% 100.0% N/A 9/9 0 9 $0.49 276,123
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 100.0% N/A 9/9 0 9 $0.64 527,673
litellm_proxy_gpt_5.1_codex_max 100.0% 100.0% N/A 8/8 1 9 $0.25 327,590
litellm_proxy_moonshot_kimi_k2_thinking 100.0% 100.0% N/A 8/8 1 9 $0.33 509,310

📋 Detailed Results

litellm_proxy_mistral_devstral_2512

  • Overall Success Rate: 87.5% (7/8)
  • Integration Tests (Required): 87.5% (7/9)
  • Total Cost: $0.19
  • Token Usage: prompt: 459,435, completion: 4,602
  • Run Suffix: litellm_proxy_mistral_devstral_2512_28737d4_devstral_2512_run_N9_20260102_221729
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t02_add_bash_hello ⚠️ REQUIRED: Shell script is not executable (Cost: $0.0085)

litellm_proxy_deepseek_deepseek_chat

  • Overall Success Rate: 100.0% (8/8)
  • Integration Tests (Required): 100.0% (8/9)
  • Total Cost: $0.05
  • Token Usage: prompt: 484,307, completion: 10,970, cache_read: 452,928
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_28737d4_deepseek_run_N9_20260102_221728
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_vertex_ai_gemini_3_pro_preview

  • Overall Success Rate: 100.0% (9/9)
  • Integration Tests (Required): 100.0% (9/9)
  • Total Cost: $0.49
  • Token Usage: prompt: 257,243, completion: 18,880, cache_read: 140,070, reasoning: 13,572
  • Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_28737d4_gemini_3_pro_run_N9_20260102_221729

litellm_proxy_claude_sonnet_4_5_20250929

  • Overall Success Rate: 100.0% (9/9)
  • Integration Tests (Required): 100.0% (9/9)
  • Total Cost: $0.64
  • Token Usage: prompt: 516,076, completion: 11,597, cache_read: 426,851, cache_write: 88,330, reasoning: 3,057
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_28737d4_sonnet_run_N9_20260102_221728

litellm_proxy_gpt_5.1_codex_max

  • Overall Success Rate: 100.0% (8/8)
  • Integration Tests (Required): 100.0% (8/9)
  • Total Cost: $0.25
  • Token Usage: prompt: 322,455, completion: 5,135, cache_read: 182,528, reasoning: 2,560
  • Run Suffix: litellm_proxy_gpt_5.1_codex_max_28737d4_gpt51_codex_run_N9_20260102_221732
  • Skipped Tests: 1

Skipped Tests:

  • t09_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

litellm_proxy_moonshot_kimi_k2_thinking

  • Overall Success Rate: 100.0% (8/8)
  • Integration Tests (Required): 100.0% (8/9)
  • Total Cost: $0.33
  • Token Usage: prompt: 496,226, completion: 13,084, cache_read: 428,640
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_28737d4_kimi_k2_run_N9_20260102_221729
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

RuntimeError(f"Polling failed with HTTP {status_code} {reason}"),
) from exc
logger.warning(
"Error polling status (will retry): HTTP %d %s",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I see this right, we retry on all statuses except 400’s? Even on 500 and 500+
Ah, until timeout, right… makes sense to me 🤔

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ☕️

@simonrosenberg simonrosenberg merged commit 0e96c43 into main Jan 3, 2026
37 checks passed
@simonrosenberg simonrosenberg deleted the fix/remote-conversation-terminal-errors branch January 3, 2026 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RemoteConversation polling never exits on errors (ERROR/404); run-eval jobs hang

4 participants