Skip to content

Conversation

@csmith49
Copy link
Collaborator

@csmith49 csmith49 commented Jan 2, 2026

Summary

Adds a graceful failure mechanism when a condenser doesn't forget any events. Instead of throwing an uncaught exception, we now throw a NoCondensationAvailableException which is caught by the base condenser class and returns the view provided.

Notably, this means if we cannot find a condensation (because, e.g., the entirety of the message history is a tool loop) then the condensation will be effectively delayed until a valid chunk of events can be forgotten.

Also updates the condenser integration test to report the number of events condensed in the first condensation, adds unit tests for the exception handling, and updates a handful of other related tests.

Intended to address #1518.

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:22eb6f2-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-22eb6f2-python \
  ghcr.io/openhands/agent-server:22eb6f2-python

All tags pushed for this build

ghcr.io/openhands/agent-server:22eb6f2-golang-amd64
ghcr.io/openhands/agent-server:22eb6f2-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:22eb6f2-golang-arm64
ghcr.io/openhands/agent-server:22eb6f2-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:22eb6f2-java-amd64
ghcr.io/openhands/agent-server:22eb6f2-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:22eb6f2-java-arm64
ghcr.io/openhands/agent-server:22eb6f2-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:22eb6f2-python-amd64
ghcr.io/openhands/agent-server:22eb6f2-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:22eb6f2-python-arm64
ghcr.io/openhands/agent-server:22eb6f2-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:22eb6f2-golang
ghcr.io/openhands/agent-server:22eb6f2-java
ghcr.io/openhands/agent-server:22eb6f2-python

About Multi-Architecture Support

  • Each variant tag (e.g., 22eb6f2-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 22eb6f2-python-amd64) are also available if needed

@openhands-ai
Copy link

openhands-ai bot commented Jan 2, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Run tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1577 at branch `fix/empty-condensation`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/agent
   agent.py1997164%89, 93, 144, 148–149, 157–159, 169–170, 186–188, 195–197, 199, 203, 206–207, 209–210, 228, 255, 260, 271, 310, 315, 326, 329, 352, 362–363, 384–386, 388, 400–401, 406–407, 427–428, 433, 445–446, 451–452, 490–491, 497–498, 502, 510–511, 519, 522–524, 529–530, 551, 558, 562–563, 601–603, 606–607, 611
openhands-sdk/openhands/sdk/context
   view.py22811151%87, 92, 97–98, 103–104, 109–113, 143–144, 147–153, 156–158, 162, 166–169, 172–173, 179–181, 185–187, 189, 192, 197–201, 204–206, 210–212, 216–219, 222–223, 225, 227, 230–231, 233–234, 236, 240, 242, 244, 247–249, 251, 253–254, 257, 260, 263, 265–266, 268, 284–288, 290, 322–323, 354, 365–366, 374, 377, 433–436, 438–440, 451–452, 454, 456, 478–481, 484, 486–487, 495, 498–499
openhands-sdk/openhands/sdk/context/condenser
   base.py361169%62, 131–134, 136–137, 139, 143, 147, 151
   llm_summarizing_condenser.py996435%51, 58, 72, 76–77, 80–83, 86–87, 89, 94, 97–98, 105–107, 113–114, 122, 124–128, 130, 152, 155, 157, 164, 168, 172–176, 178, 201–202, 204, 206–208, 210–212, 214, 218–219, 221–222, 224, 234, 237, 240, 245, 248, 251, 259, 263–264, 270, 272
TOTAL14574693052% 

@csmith49 csmith49 added the integration-test Runs the integration tests and comments the results label Jan 2, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2026

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2026

🧪 Integration Tests Results

Overall Success Rate: 92.0%
Total Cost: $2.63
Models Tested: 6
Timestamp: 2026-01-02 18:35:33 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Integration (Required) Behavior (Optional) Tests Passed Skipped Total Cost Tokens
litellm_proxy_moonshot_kimi_k2_thinking 87.5% 87.5% N/A 7/8 1 9 $0.78 1,245,526
litellm_proxy_gpt_5.1_codex_max 87.5% 87.5% N/A 7/8 1 9 $0.16 196,618
litellm_proxy_deepseek_deepseek_chat 100.0% 100.0% N/A 8/8 1 9 $0.14 1,568,917
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 100.0% N/A 9/9 0 9 $0.66 597,772
litellm_proxy_vertex_ai_gemini_3_pro_preview 100.0% 100.0% N/A 9/9 0 9 $0.72 581,227
litellm_proxy_mistral_devstral_2512 75.0% 75.0% N/A 6/8 1 9 $0.18 427,110

📋 Detailed Results

litellm_proxy_moonshot_kimi_k2_thinking

  • Overall Success Rate: 87.5% (7/8)
  • Integration Tests (Required): 87.5% (7/9)
  • Total Cost: $0.78
  • Token Usage: prompt: 1,229,776, completion: 15,750, cache_read: 1,094,400
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_c160c5c_kimi_k2_run_N9_20260102_182208
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t06_github_pr_browsing ⚠️ REQUIRED: No final answer found from agent. Events: 84, LLM messages: 1 (Cost: $0.57)

litellm_proxy_gpt_5.1_codex_max

  • Overall Success Rate: 87.5% (7/8)
  • Integration Tests (Required): 87.5% (7/9)
  • Total Cost: $0.16
  • Token Usage: prompt: 192,391, completion: 4,227, cache_read: 112,512, reasoning: 2,304
  • Run Suffix: litellm_proxy_gpt_5.1_codex_max_c160c5c_gpt51_codex_run_N9_20260102_182214
  • Skipped Tests: 1

Skipped Tests:

  • t09_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

Failed Tests:

  • t06_github_pr_browsing ⚠️ REQUIRED: Agent's final answer does not contain the expected information about the PR content. Final answer preview: I don’t have direct access to external sites (including GitHub) from this environment, so I can’t open that PR to see what’s going on or what @asadm suggested. If you can paste the PR description/comm... (Cost: $0.0063)

litellm_proxy_deepseek_deepseek_chat

  • Overall Success Rate: 100.0% (8/8)
  • Integration Tests (Required): 100.0% (8/9)
  • Total Cost: $0.14
  • Token Usage: prompt: 1,554,917, completion: 14,000, cache_read: 1,458,496
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_c160c5c_deepseek_run_N9_20260102_182214
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_claude_sonnet_4_5_20250929

  • Overall Success Rate: 100.0% (9/9)
  • Integration Tests (Required): 100.0% (9/9)
  • Total Cost: $0.66
  • Token Usage: prompt: 584,179, completion: 13,593, cache_read: 502,615, cache_write: 80,546, reasoning: 4,138
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_c160c5c_sonnet_run_N9_20260102_182216

litellm_proxy_vertex_ai_gemini_3_pro_preview

  • Overall Success Rate: 100.0% (9/9)
  • Integration Tests (Required): 100.0% (9/9)
  • Total Cost: $0.72
  • Token Usage: prompt: 558,862, completion: 22,365, cache_read: 371,594, reasoning: 15,607
  • Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_c160c5c_gemini_3_pro_run_N9_20260102_182210

litellm_proxy_mistral_devstral_2512

  • Overall Success Rate: 75.0% (6/8)
  • Integration Tests (Required): 75.0% (6/9)
  • Total Cost: $0.18
  • Token Usage: prompt: 424,390, completion: 2,720
  • Run Suffix: litellm_proxy_mistral_devstral_2512_c160c5c_devstral_2512_run_N9_20260102_182210
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t09_token_condenser ⚠️ REQUIRED: Condensation not triggered. Token counting may not work. (Cost: $0.002)
  • t02_add_bash_hello ⚠️ REQUIRED: Shell script is not executable (Cost: $0.01)

@csmith49 csmith49 added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Jan 2, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2026

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2026

🧪 Integration Tests Results

Overall Success Rate: 98.0%
Total Cost: $2.11
Models Tested: 6
Timestamp: 2026-01-02 18:55:22 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Integration (Required) Behavior (Optional) Tests Passed Skipped Total Cost Tokens
litellm_proxy_deepseek_deepseek_chat 100.0% 100.0% N/A 8/8 1 9 $0.05 450,871
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 100.0% N/A 9/9 0 9 $0.63 514,643
litellm_proxy_vertex_ai_gemini_3_pro_preview 100.0% 100.0% N/A 9/9 0 9 $0.56 337,211
litellm_proxy_mistral_devstral_2512 87.5% 87.5% N/A 7/8 1 9 $0.32 772,317
litellm_proxy_moonshot_kimi_k2_thinking 100.0% 100.0% N/A 8/8 1 9 $0.36 563,530
litellm_proxy_gpt_5.1_codex_max 100.0% 100.0% N/A 8/8 1 9 $0.19 218,548

📋 Detailed Results

litellm_proxy_deepseek_deepseek_chat

  • Overall Success Rate: 100.0% (8/8)
  • Integration Tests (Required): 100.0% (8/9)
  • Total Cost: $0.05
  • Token Usage: prompt: 439,097, completion: 11,774, cache_read: 410,816
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_d74c039_deepseek_run_N9_20260102_184719
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_claude_sonnet_4_5_20250929

  • Overall Success Rate: 100.0% (9/9)
  • Integration Tests (Required): 100.0% (9/9)
  • Total Cost: $0.63
  • Token Usage: prompt: 503,187, completion: 11,456, cache_read: 414,192, cache_write: 88,126, reasoning: 3,115
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_d74c039_sonnet_run_N9_20260102_184724

litellm_proxy_vertex_ai_gemini_3_pro_preview

  • Overall Success Rate: 100.0% (9/9)
  • Integration Tests (Required): 100.0% (9/9)
  • Total Cost: $0.56
  • Token Usage: prompt: 316,425, completion: 20,786, cache_read: 176,898, reasoning: 14,821
  • Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_d74c039_gemini_3_pro_run_N9_20260102_184724

litellm_proxy_mistral_devstral_2512

  • Overall Success Rate: 87.5% (7/8)
  • Integration Tests (Required): 87.5% (7/9)
  • Total Cost: $0.32
  • Token Usage: prompt: 767,170, completion: 5,147
  • Run Suffix: litellm_proxy_mistral_devstral_2512_d74c039_devstral_2512_run_N9_20260102_184724
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t02_add_bash_hello ⚠️ REQUIRED: Shell script is not executable (Cost: $0.01)

litellm_proxy_moonshot_kimi_k2_thinking

  • Overall Success Rate: 100.0% (8/8)
  • Integration Tests (Required): 100.0% (8/9)
  • Total Cost: $0.36
  • Token Usage: prompt: 551,483, completion: 12,047, cache_read: 488,109
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_d74c039_kimi_k2_run_N9_20260102_184724
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_gpt_5.1_codex_max

  • Overall Success Rate: 100.0% (8/8)
  • Integration Tests (Required): 100.0% (8/9)
  • Total Cost: $0.19
  • Token Usage: prompt: 214,141, completion: 4,407, cache_read: 108,288, reasoning: 2,240
  • Run Suffix: litellm_proxy_gpt_5.1_codex_max_d74c039_gpt51_codex_run_N9_20260102_184720
  • Skipped Tests: 1

Skipped Tests:

  • t09_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

@csmith49 csmith49 marked this pull request as ready for review January 2, 2026 19:05
@csmith49
Copy link
Collaborator Author

csmith49 commented Jan 2, 2026

There are two integration test runs here -- the first one has the token condenser test failing for Devstral, but it's the same failure mode as with GPT 5.1. I'm hesitant to mark it skipped because the failure seems to be intermittent.

@csmith49 csmith49 removed the integration-test Runs the integration tests and comments the results label Jan 2, 2026
@csmith49
Copy link
Collaborator Author

csmith49 commented Jan 2, 2026

IMPORTANT NOTE: This PR is not a complete solution. There are still some strange cases where we can get caught in loops. For example, if a context window exception is thrown we'll end up retrying into infinity.

@xingyaoww
Copy link
Collaborator

For example, if a context window exception is thrown we'll end up retrying into infinity.

Could we fail hard in this case? IMO fail hard is better here than going into infinite loop though

@csmith49
Copy link
Collaborator Author

csmith49 commented Jan 2, 2026

Could we fail hard in this case? IMO fail hard is better here than going into infinite loop though

That's doable. The challenge right now is just figuring out when we're in that situation. I've got some draft changes that make that easy though.

@all-hands-bot
Copy link
Collaborator

[Automatic Post]: I have assigned @xingyaoww as a reviewer based on git blame information. Thanks in advance for the help!

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment, otherwise LGTM

@csmith49
Copy link
Collaborator Author

csmith49 commented Jan 5, 2026

Hey @xingyaoww can you take another look now? Here are the changes from what you saw:

  • The CondensationRequest event has no more enum -- they're all the same, regardless of whether they come from the agent or from the user (via LocalConversation.condense, for example).
  • The RollingCondenser base class has had the API slightly changed. The function should_condense is replaced by condensation_requirement, which has return type CondensationRequirement | None. The requirements can be either hard or soft and the condenser gets to pick when overriding this function.
    • If the requirement is hard and there's no possible condensation, throw an exception. This is the behavior before this change.
    • If the requirement is soft and there's no possible condensation, just skip it for now.

This flow lets us wait until the manipulation indices are good when we cross the event count threshold but throw an error in pretty much all other cases.

Next step (in a separate PR) is instead condensing everything if we get a hard requirement and no condensation.

@csmith49 csmith49 requested a review from xingyaoww January 5, 2026 18:34
Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's run integration test - good to merge when it passes

@xingyaoww xingyaoww added the integration-test Runs the integration tests and comments the results label Jan 5, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

🧪 Integration Tests Results

Overall Success Rate: 98.0%
Total Cost: $2.06
Models Tested: 6
Timestamp: 2026-01-05 18:49:55 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Integration (Required) Behavior (Optional) Tests Passed Skipped Total Cost Tokens
litellm_proxy_gpt_5.1_codex_max 100.0% 100.0% N/A 8/8 1 9 $0.30 540,357
litellm_proxy_deepseek_deepseek_chat 100.0% 100.0% N/A 8/8 1 9 $0.05 371,158
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 100.0% N/A 9/9 0 9 $0.63 546,416
litellm_proxy_mistral_devstral_2512 87.5% 87.5% N/A 7/8 1 9 $0.25 616,679
litellm_proxy_moonshot_kimi_k2_thinking 100.0% 100.0% N/A 8/8 1 9 $0.20 298,232
litellm_proxy_vertex_ai_gemini_3_pro_preview 100.0% 100.0% N/A 9/9 0 9 $0.62 388,775

📋 Detailed Results

litellm_proxy_gpt_5.1_codex_max

  • Overall Success Rate: 100.0% (8/8)
  • Integration Tests (Required): 100.0% (8/9)
  • Total Cost: $0.30
  • Token Usage: prompt: 531,629, completion: 8,728, cache_read: 398,720, reasoning: 5,568
  • Run Suffix: litellm_proxy_gpt_5.1_codex_max_05f62aa_gpt51_codex_run_N9_20260105_184043
  • Skipped Tests: 1

Skipped Tests:

  • t09_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

litellm_proxy_deepseek_deepseek_chat

  • Overall Success Rate: 100.0% (8/8)
  • Integration Tests (Required): 100.0% (8/9)
  • Total Cost: $0.05
  • Token Usage: prompt: 357,869, completion: 13,289, cache_read: 330,048
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_05f62aa_deepseek_run_N9_20260105_184040
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_claude_sonnet_4_5_20250929

  • Overall Success Rate: 100.0% (9/9)
  • Integration Tests (Required): 100.0% (9/9)
  • Total Cost: $0.63
  • Token Usage: prompt: 534,503, completion: 11,913, cache_read: 448,959, cache_write: 84,622, reasoning: 3,243
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_05f62aa_sonnet_run_N9_20260105_184048

litellm_proxy_mistral_devstral_2512

  • Overall Success Rate: 87.5% (7/8)
  • Integration Tests (Required): 87.5% (7/9)
  • Total Cost: $0.25
  • Token Usage: prompt: 611,781, completion: 4,898
  • Run Suffix: litellm_proxy_mistral_devstral_2512_05f62aa_devstral_2512_run_N9_20260105_184042
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t02_add_bash_hello ⚠️ REQUIRED: Shell script is not executable (Cost: $0.0085)

litellm_proxy_moonshot_kimi_k2_thinking

  • Overall Success Rate: 100.0% (8/8)
  • Integration Tests (Required): 100.0% (8/9)
  • Total Cost: $0.20
  • Token Usage: prompt: 286,290, completion: 11,942, cache_read: 223,488
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_05f62aa_kimi_k2_run_N9_20260105_184039
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_vertex_ai_gemini_3_pro_preview

  • Overall Success Rate: 100.0% (9/9)
  • Integration Tests (Required): 100.0% (9/9)
  • Total Cost: $0.62
  • Token Usage: prompt: 364,108, completion: 24,667, cache_read: 222,021, reasoning: 18,899
  • Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_05f62aa_gemini_3_pro_run_N9_20260105_184038

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@xingyaoww xingyaoww merged commit 1fbf867 into main Jan 5, 2026
35 checks passed
@xingyaoww xingyaoww deleted the fix/empty-condensation branch January 5, 2026 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants