Skip to content

Conversation

@xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Jan 5, 2026

Summary

This fixes issue #1228 where the context_window (and other stats) were being reset to zero when sending ConversationStatusUpdateEvent (full_state) after resuming a conversation.

Root cause: In ConversationState.create(), when resuming a conversation, the stats were correctly deserialized from base_state.json but then immediately overwritten with an empty ConversationStats() object at line 216.

Changes:

  • Remove the stats reset line in state.py resume path (the stats are already correctly deserialized from base_state.json)
  • Add context_window field to TokenUsageData in integration test schemas
  • Add warning log when context_window is 0 despite LLM usage (to catch similar issues in the future)
  • Add regression test test_conversation_state_stats_preserved_on_resume for stats preservation on resume
  • Update existing test test_conversation_state_flags_persistence to reflect correct behavior

Fixes #1228

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

@xingyaoww can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:70c1405-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-70c1405-python \
  ghcr.io/openhands/agent-server:70c1405-python

All tags pushed for this build

ghcr.io/openhands/agent-server:70c1405-golang-amd64
ghcr.io/openhands/agent-server:70c1405-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:70c1405-golang-arm64
ghcr.io/openhands/agent-server:70c1405-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:70c1405-java-amd64
ghcr.io/openhands/agent-server:70c1405-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:70c1405-java-arm64
ghcr.io/openhands/agent-server:70c1405-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:70c1405-python-amd64
ghcr.io/openhands/agent-server:70c1405-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:70c1405-python-arm64
ghcr.io/openhands/agent-server:70c1405-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:70c1405-golang
ghcr.io/openhands/agent-server:70c1405-java
ghcr.io/openhands/agent-server:70c1405-python

About Multi-Architecture Support

  • Each variant tag (e.g., 70c1405-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 70c1405-python-amd64) are also available if needed

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/llm
   llm.py42015762%359, 364, 368, 372–373, 376, 380–381, 392–393, 395–396, 400, 417, 435–438, 485, 515–517, 538, 542, 558, 565–566, 590–591, 601, 626–631, 652–653, 656, 660, 672, 677, 679–681, 691, 699–706, 710–713, 715, 728, 732–733, 735–736, 741–742, 744, 751, 754–759, 816–821, 878–879, 882–885, 927, 944, 998, 1001, 1004–1012, 1016–1018, 1021, 1024–1026, 1033–1034, 1043, 1050–1052, 1056, 1058–1063, 1065–1082, 1085–1089, 1091–1092, 1098–1107, 1120, 1134, 1139
TOTAL14547691352% 

@xingyaoww xingyaoww requested a review from hieptl January 5, 2026 16:36
@xingyaoww xingyaoww added the integration-test Runs the integration tests and comments the results label Jan 5, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

🧪 Integration Tests Results

Overall Success Rate: 96.0%
Total Cost: $1.95
Models Tested: 6
Timestamp: 2026-01-05 16:52:38 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Integration (Required) Behavior (Optional) Tests Passed Skipped Total Cost Tokens
litellm_proxy_deepseek_deepseek_chat 100.0% 100.0% N/A 8/8 1 9 $0.06 536,384
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 100.0% N/A 9/9 0 9 $0.67 579,762
litellm_proxy_vertex_ai_gemini_3_pro_preview 100.0% 100.0% N/A 9/9 0 9 $0.54 363,395
litellm_proxy_moonshot_kimi_k2_thinking 100.0% 100.0% N/A 8/8 1 9 $0.28 415,593
litellm_proxy_mistral_devstral_2512 87.5% 87.5% N/A 7/8 1 9 $0.25 604,509
litellm_proxy_gpt_5.1_codex_max 87.5% 87.5% N/A 7/8 1 9 $0.15 211,982

📋 Detailed Results

litellm_proxy_deepseek_deepseek_chat

  • Overall Success Rate: 100.0% (8/8)
  • Integration Tests (Required): 100.0% (8/9)
  • Total Cost: $0.06
  • Token Usage: prompt: 524,021, completion: 12,363, cache_read: 500,032
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_cc46f95_deepseek_run_N9_20260105_163707
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_claude_sonnet_4_5_20250929

  • Overall Success Rate: 100.0% (9/9)
  • Integration Tests (Required): 100.0% (9/9)
  • Total Cost: $0.67
  • Token Usage: prompt: 567,099, completion: 12,663, cache_read: 476,176, cache_write: 89,951, reasoning: 3,523
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_cc46f95_sonnet_run_N9_20260105_163704

litellm_proxy_vertex_ai_gemini_3_pro_preview

  • Overall Success Rate: 100.0% (9/9)
  • Integration Tests (Required): 100.0% (9/9)
  • Total Cost: $0.54
  • Token Usage: prompt: 344,824, completion: 18,571, cache_read: 208,345, reasoning: 13,349
  • Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_cc46f95_gemini_3_pro_run_N9_20260105_163704

litellm_proxy_moonshot_kimi_k2_thinking

  • Overall Success Rate: 100.0% (8/8)
  • Integration Tests (Required): 100.0% (8/9)
  • Total Cost: $0.28
  • Token Usage: prompt: 397,283, completion: 18,310, cache_read: 329,216
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_cc46f95_kimi_k2_run_N9_20260105_163710
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_mistral_devstral_2512

  • Overall Success Rate: 87.5% (7/8)
  • Integration Tests (Required): 87.5% (7/9)
  • Total Cost: $0.25
  • Token Usage: prompt: 598,340, completion: 6,169
  • Run Suffix: litellm_proxy_mistral_devstral_2512_cc46f95_devstral_2512_run_N9_20260105_163706
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t02_add_bash_hello ⚠️ REQUIRED: Shell script is not executable (Cost: $0.0085)

litellm_proxy_gpt_5.1_codex_max

  • Overall Success Rate: 87.5% (7/8)
  • Integration Tests (Required): 87.5% (7/9)
  • Total Cost: $0.15
  • Token Usage: prompt: 208,837, completion: 3,145, cache_read: 129,408, reasoning: 1,280
  • Run Suffix: litellm_proxy_gpt_5.1_codex_max_cc46f95_gpt51_codex_run_N9_20260105_163702
  • Skipped Tests: 1

Skipped Tests:

  • t09_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

Failed Tests:

  • t06_github_pr_browsing ⚠️ REQUIRED: Agent's final answer does not contain the expected information about the PR content. Final answer preview: I don’t have access to browse external sites like GitHub from this environment, so I can’t open that pull request directly. If you can share the key details (e.g., the PR description, discussion, or s... (Cost: $0.007)

@xingyaoww
Copy link
Collaborator Author

@OpenHands #1590 (comment)

Help me download the artifact and can you make sure the context window dispalying 0 is fixed?

@openhands-ai
Copy link

openhands-ai bot commented Jan 5, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Jan 5, 2026

Summary

I downloaded the integration test artifacts from PR #1590 and identified the root cause of the context_window displaying as 0 issue.

Root Cause Analysis

The integration test results showed context_window: 0 for all tests. After analyzing the code, I found that the actual bug was different from what the original PR commit addressed:

Fix Applied

Modified openhands-sdk/openhands/sdk/llm/llm.py to always pass context_window to telemetry regardless of whether logging is enabled:

# Before: log_ctx was None when logging disabled
log_ctx = None
if self._telemetry.log_enabled:
    log_ctx = {..., "context_window": self.max_input_tokens or 0}

# After: Always include context_window for metrics tracking
log_ctx: dict[str, Any] = {"context_window": self.max_input_tokens or 0}
if self._telemetry.log_enabled:
    log_ctx.update({...})  # Add logging-specific fields

This change was applied to both the completion() and responses() methods.

Testing

  • Added regression test test_context_window_tracked_without_logging_enabled to verify context_window is tracked even when logging is disabled
  • All 482 LLM tests pass
  • Pre-commit hooks pass (ruff, pyright, etc.)

Changes Pushed

The fix has been pushed to branch openhands/fix-context-window-zero-issue-1228 and will be included in PR #1590.

View full conversation

The context_window was only being passed to telemetry when log_enabled
was True. This caused context_window to always be 0 in production when
logging was disabled.

Changes:
- Always pass context_window to telemetry regardless of log_enabled
- Add context_window field to integration test TokenUsageData schema

Co-authored-by: openhands <openhands@all-hands.dev>
@xingyaoww xingyaoww force-pushed the openhands/fix-context-window-zero-issue-1228 branch from 208dd7a to 6ecbc5f Compare January 5, 2026 17:19
@enyst
Copy link
Collaborator

enyst commented Jan 5, 2026

@OpenHands Do a /codereview-roasted on this PR diff. Understand the goal is actually:

Additional bug found: context_window was only being passed to telemetry when log_enabled was True. Since logging is typically disabled in production/integration tests, context_window was always 0.

Is this correct?

Then read the review threads (inline threads) and answer the question there.

@openhands-ai
Copy link

openhands-ai bot commented Jan 5, 2026

I'm on it! enyst can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Jan 5, 2026

Final summary

What I did: I reviewed the PR diff for openhands/fix-context-window-zero-issue-1228, verified the stated “additional bug” against the actual code changes, pulled the inline review comment(s) via the GitHub API, and answered the inline question with a roasted-style code review + technical explanation.


Restated answer to the main question (“Is this correct?”)

Yes. The statement is correct:

context_window was only being passed to telemetry when log_enabled was True. Since logging is typically disabled in production/integration tests, context_window was always 0.

In the old code, log_ctx was None unless self._telemetry.log_enabled was true, and context_window lived inside log_ctx. So telemetry never saw it when logging was off, which led to 0 downstream. The PR fixes that by always including {"context_window": self.max_input_tokens or 0} in the telemetry context regardless of log_enabled, while still gating heavy debug payload (messages/tools/kwargs) behind log_enabled.


Answer to the inline thread question

The reviewer asked why we only pass context_window in log_ctx but not “completion_tokens etc”.

My answer: token counts aren’t sourced from log_ctx—they come from the LLM response usage / telemetry instrumentation around the response. log_ctx is request-side context (inputs/tools/kwargs). context_window is a model/config property that cannot be reliably inferred from the response usage, so it must be passed explicitly for metrics even when logging is disabled.


Checklist against your request

  • Did a /codereview-roasted of the PR diff (no code changes)
  • Confirmed whether the “additional bug” explanation is correct
  • Retrieved and read inline review threads
  • Answered the question in the inline thread (in this chat, per instructions: no PR comments)

Notes on pushing / changes

No code changes were made in this session, so:

  • no diff to double-check
  • nothing to commit
  • nothing to push

View full conversation

Copy link
Collaborator

enyst commented Jan 5, 2026

🟡 Acceptable taste — fixes the real issue with minimal churn, but the naming and data flow are still muddled.

[IMPROVEMENT OPPORTUNITIES]

1) Misleading variable name: log_ctx is no longer “log” context

  • You now send it even when logging is disabled.
  • So it’s not a “log context”, it’s a “telemetry context” or “request context”.
  • This is exactly how code rots: names lie, future devs “fix” the wrong thing.

Concrete suggestion: rename log_ctxtelemetry_ctx (or request_ctx) and keep the “logging extras” gated.

2) The dict update() pattern is fine, but it’s slightly backwards
Right now:

  • build minimal dict
  • mutate it conditionally

That’s okay, but it reads like “logging is special, everything else is baseline” — which is correct — but the naming (log_ctx) contradicts it.

3) The real architectural smell: telemetry depends on logging structure
The root bug exists because the system coupled:

  • “stuff we want for verbose logging”
    with
  • “stuff we want for metrics”

This PR patches it by splitting the payload, which is good, but the fact that it’s still the same variable passed into telemetry suggests telemetry API itself might be too logging-shaped.

@openhands-ai
Copy link

openhands-ai bot commented Jan 6, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Agent Server
    • Run tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1590 at branch `openhands/fix-context-window-zero-issue-1228`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Context Window Displays as 0 When Sending ConversationStatusUpdateEvent (Full State)

4 participants