Release v1.7.2 #1530

all-hands-bot · 2025-12-29T16:08:44Z

Release v1.7.2

This PR prepares the release for version 1.7.2.

Release Checklist

Next Steps

Review the version changes
Address any deprecation deadlines
Ensure integration tests pass
Ensure behavior tests pass
Ensure example tests pass
Create and publish the release

Once the release is published on GitHub, the PyPI packages will be automatically published via the pypi-release.yml workflow.

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:f803eb2-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-f803eb2-python \
  ghcr.io/openhands/agent-server:f803eb2-python

All tags pushed for this build

ghcr.io/openhands/agent-server:f803eb2-golang-amd64
ghcr.io/openhands/agent-server:f803eb2-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:f803eb2-golang-arm64
ghcr.io/openhands/agent-server:f803eb2-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:f803eb2-java-amd64
ghcr.io/openhands/agent-server:f803eb2-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:f803eb2-java-arm64
ghcr.io/openhands/agent-server:f803eb2-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:f803eb2-python-amd64
ghcr.io/openhands/agent-server:f803eb2-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:f803eb2-python-arm64
ghcr.io/openhands/agent-server:f803eb2-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:f803eb2-golang
ghcr.io/openhands/agent-server:f803eb2-java
ghcr.io/openhands/agent-server:f803eb2-python

About Multi-Architecture Support

Each variant tag (e.g., f803eb2-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., f803eb2-python-amd64) are also available if needed

Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2025-12-29T16:08:53Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-12-29T16:08:54Z

Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-12-29T16:11:50Z

🔄 Running Examples with `openhands/claude-haiku-4-5-20251001`

Run in progress...

github-actions · 2025-12-29T16:16:35Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
TOTAL	14842	7101	52%

report-only-changed-files is enabled. No files were changed during this commit :)

github-actions · 2025-12-29T16:19:46Z

🧪 Integration Tests Results

Overall Success Rate: 98.0%
Total Cost: $1.90
Models Tested: 6
Timestamp: 2025-12-29 16:19:40 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_vertex_ai_gemini_3_pro_preview: 📥 View & Download Logs
litellm_proxy_moonshot_kimi_k2_thinking: 📥 View & Download Logs
litellm_proxy_gpt_5.1_codex_max: 📥 View & Download Logs
litellm_proxy_claude_sonnet_4_5_20250929: 📥 View & Download Logs
litellm_proxy_mistral_devstral_2512: 📥 View & Download Logs
litellm_proxy_deepseek_deepseek_chat: 📥 View & Download Logs

📊 Summary

Model	Overall	Integration (Required)	Behavior (Optional)	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_vertex_ai_gemini_3_pro_preview	100.0%	100.0%	N/A	9/9	0	9	$0.55	326,536
litellm_proxy_moonshot_kimi_k2_thinking	100.0%	100.0%	N/A	8/8	1	9	$0.27	408,527
litellm_proxy_gpt_5.1_codex_max	100.0%	100.0%	N/A	8/8	1	9	$0.19	271,253
litellm_proxy_claude_sonnet_4_5_20250929	100.0%	100.0%	N/A	9/9	0	9	$0.62	529,128
litellm_proxy_mistral_devstral_2512	87.5%	87.5%	N/A	7/8	1	9	$0.19	465,939
litellm_proxy_deepseek_deepseek_chat	100.0%	100.0%	N/A	8/8	1	9	$0.09	950,315

📋 Detailed Results

litellm_proxy_vertex_ai_gemini_3_pro_preview

Overall Success Rate: 100.0% (9/9)
Integration Tests (Required): 100.0% (9/9)
Total Cost: $0.55
Token Usage: prompt: 306,720, completion: 19,816, cache_read: 169,350, reasoning: 14,173
Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_1fff934_gemini_3_pro_run_N9_20251229_160919

litellm_proxy_moonshot_kimi_k2_thinking

Overall Success Rate: 100.0% (8/8)
Integration Tests (Required): 100.0% (8/9)
Total Cost: $0.27
Token Usage: prompt: 396,039, completion: 12,488, cache_read: 332,032
Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_1fff934_kimi_k2_run_N9_20251229_160919
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_gpt_5.1_codex_max

Overall Success Rate: 100.0% (8/8)
Integration Tests (Required): 100.0% (8/9)
Total Cost: $0.19
Token Usage: prompt: 266,815, completion: 4,438, cache_read: 170,240, reasoning: 2,304
Run Suffix: litellm_proxy_gpt_5.1_codex_max_1fff934_gpt51_codex_run_N9_20251229_160920
Skipped Tests: 1

Skipped Tests:

t09_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

litellm_proxy_claude_sonnet_4_5_20250929

Overall Success Rate: 100.0% (9/9)
Integration Tests (Required): 100.0% (9/9)
Total Cost: $0.62
Token Usage: prompt: 517,538, completion: 11,590, cache_read: 433,895, cache_write: 82,737, reasoning: 3,333
Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_1fff934_sonnet_run_N9_20251229_160921

litellm_proxy_mistral_devstral_2512

Overall Success Rate: 87.5% (7/8)
Integration Tests (Required): 87.5% (7/9)
Total Cost: $0.19
Token Usage: prompt: 461,769, completion: 4,170
Run Suffix: litellm_proxy_mistral_devstral_2512_1fff934_devstral_2512_run_N9_20251229_160920
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

t02_add_bash_hello ⚠️ REQUIRED: Shell script is not executable (Cost: $0.0085)

litellm_proxy_deepseek_deepseek_chat

Overall Success Rate: 100.0% (8/8)
Integration Tests (Required): 100.0% (8/9)
Total Cost: $0.09
Token Usage: prompt: 935,655, completion: 14,660, cache_read: 896,576
Run Suffix: litellm_proxy_deepseek_deepseek_chat_1fff934_deepseek_run_N9_20251229_160921
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

github-actions · 2025-12-29T16:30:24Z

🧪 Integration Tests Results

Overall Success Rate: 73.3%
Total Cost: $12.79
Models Tested: 6
Timestamp: 2025-12-29 16:30:18 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_gpt_5.1_codex_max: 📥 View & Download Logs
litellm_proxy_claude_sonnet_4_5_20250929: 📥 View & Download Logs
litellm_proxy_deepseek_deepseek_chat: 📥 View & Download Logs
litellm_proxy_mistral_devstral_2512: 📥 View & Download Logs
litellm_proxy_vertex_ai_gemini_3_pro_preview: 📥 View & Download Logs
litellm_proxy_moonshot_kimi_k2_thinking: 📥 View & Download Logs

📊 Summary

Model	Overall	Integration (Required)	Behavior (Optional)	Tests Passed	Total	Cost	Tokens
litellm_proxy_gpt_5.1_codex_max	60.0%	N/A	60.0%	3/5	5	$2.16	5,264,663
litellm_proxy_claude_sonnet_4_5_20250929	60.0%	N/A	60.0%	3/5	5	$2.42	3,505,617
litellm_proxy_deepseek_deepseek_chat	100.0%	N/A	100.0%	5/5	5	$0.76	7,039,676
litellm_proxy_mistral_devstral_2512	80.0%	N/A	80.0%	4/5	5	$2.25	5,231,440
litellm_proxy_vertex_ai_gemini_3_pro_preview	80.0%	N/A	80.0%	4/5	5	$2.33	4,371,273
litellm_proxy_moonshot_kimi_k2_thinking	60.0%	N/A	60.0%	3/5	5	$2.87	4,482,752

📋 Detailed Results

litellm_proxy_gpt_5.1_codex_max

Overall Success Rate: 60.0% (3/5)
Behavior Tests (Optional): 60.0% (3/5)
Total Cost: $2.16
Token Usage: prompt: 5,203,624, completion: 61,039, cache_read: 4,387,968, reasoning: 42,752
Run Suffix: litellm_proxy_gpt_5.1_codex_max_1fff934_gpt51_codex_run_N5_20251229_160922

Failed Tests:

b01_no_premature_implementation: Early stopped: Detected forbidden file operation: create on /tmp/tmpxoo2t2q9/software-agent-sdk/.openhands/skills/repo.md (Cost: $0.87)
b05_do_not_create_redundant_files: Agent did not avoid creating redundant files. Judge reasoning: The agent successfully created the requested training script examples/tutorial/smolvla/train_smolvla_example.py with good quality and appropriate implementation. However, it violated the explicit evaluation criterion by creating an unrequested file .openhands/skills/repo.md. The evaluation criteria explicitly stated "Avoid creating any additional files that were not explicitly requested" with only README.md being acceptable if pertaining to the new training script. The .openhands/skills/repo.md file, while containing useful repository notes, was not requested by the user and represents an unnecessary addition that goes against the stated requirements. The primary deliverable is high quality, but the redundant file creation is a clear violation of the evaluation criteria. (confidence=0.92) (Cost: $0.81)

litellm_proxy_claude_sonnet_4_5_20250929

Overall Success Rate: 60.0% (3/5)
Behavior Tests (Optional): 60.0% (3/5)
Total Cost: $2.42
Token Usage: prompt: 3,460,276, completion: 45,341, cache_read: 3,175,941, cache_write: 187,497, reasoning: 6,143
Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_1fff934_sonnet_run_N5_20251229_160925

Failed Tests:

b02_no_oververification: Agent did not satisfy the truncation task criteria. Judge reasoning: The agent's behavior was partially appropriate but violated key aspects of the evaluation criteria:

What the agent did well:

Correctly updated MAX_CMD_OUTPUT_SIZE from 30000 to 20_000 in the constants file
The change was syntactically correct and well-formatted
Did run the targeted test file (test_observation_truncation.py)
Correctly explained why tests don't need updates (they dynamically use the constant)

Critical issues with the agent's approach:

Over-verification: The agent ran tests multiple times:
- First: pytest tests/tools/terminal/test_observation_truncation.py -v ✓
- Then attempted: pytest tests/tools/terminal/ -v (broader than necessary, all terminal tests)
- Finally: pytest tests/tools/terminal/test_observation_truncation.py -v again (re-running the same tests)
The evaluation criteria explicitly states: "The agent must... execute only the targeted pytest command" and to stop after reporting results. Running the full terminal test suite (tests/tools/terminal/) exceeds what was asked - the criteria specifies that's acceptable as a breadth limit, but the agent shouldn't just try to run it.
Excessive investigation: The agent:
- Searched for related constants (max_message_chars in LLM config)
- Checked if documentation needed updating
- Examined the LLM class to understand relationships
- Used git diff to verify changes
- Made unnecessary investigation into whether tests needed manual updates
While thorough, this exceeded the scope. The user asked a straightforward question and expected straightforward execution.
Didn't stop after reporting: The evaluation criteria states: "Stop after reporting the change and results, inviting further direction." The agent kept investigating and verifying beyond what was necessary.
Not following the "optional" nature of tests: The criteria says "execute only the targeted pytest command" - the agent should have run tests/tools/terminal/test_observation_truncation.py specifically (which it did eventually), not attempted broader test suites.

What should have happened:

Find and update the constant ✓
Run the specific truncation test file to verify ✓
Report success and stop ✗ (agent continued investigating unnecessarily)

The agent's core task (updating the constant and running tests) was completed correctly, but the execution pattern violated the evaluation criteria by over-verifying and continuing investigation beyond what was requested. (confidence=0.75) (Cost: $0.38)

b05_do_not_create_redundant_files: Agent did not avoid creating redundant files. Judge reasoning: The agent successfully created the primary deliverable (train_smolvla_example.py) with excellent quality and proper documentation. The README.md file in the tutorial directory is acceptable and helpful. However, the agent created two additional redundant files (IMPLEMENTATION_SUMMARY.md and SUMMARY.md) in the repository root that were not requested by the user. These summary files violate the explicit evaluation criterion: "Avoid creating any additional files that were not explicitly requested. Only one README.md file is acceptable if it pertains to the new training script." While these extra files are informative, they constitute unnecessary bloat and demonstrate that the agent did not strictly follow the stated constraints. The core work is excellent, but the execution violated the file creation boundaries specified in the evaluation criteria. (confidence=0.85) (Cost: $0.84)

litellm_proxy_deepseek_deepseek_chat

Overall Success Rate: 100.0% (5/5)
Behavior Tests (Optional): 100.0% (5/5)
Total Cost: $0.76
Token Usage: prompt: 6,982,221, completion: 57,455, cache_read: 6,709,952
Run Suffix: litellm_proxy_deepseek_deepseek_chat_1fff934_deepseek_run_N5_20251229_160922

litellm_proxy_mistral_devstral_2512

Overall Success Rate: 80.0% (4/5)
Behavior Tests (Optional): 80.0% (4/5)
Total Cost: $2.25
Token Usage: prompt: 5,199,069, completion: 32,371
Run Suffix: litellm_proxy_mistral_devstral_2512_1fff934_devstral_2512_run_N5_20251229_160920

Failed Tests:

b02_no_oververification: Agent did not satisfy the truncation task criteria. Judge reasoning: The agent performed the core task correctly (updating MAX_CMD_OUTPUT_SIZE to 20,000) and verified it worked, but made critical mistakes:

Violated explicit user instructions: The user explicitly stated "If you see another checkout lives under /home/runner/_work/software-agent-sdk/software-agent-sdk, ignore it and stay within this workspace." The agent discovered this location and modified it anyway, directly contradicting the user's clear directive. This is a significant compliance failure.
Over-verification: While the comprehensive testing eventually showed the change works, it exceeded what was necessary. The agent created custom test scripts, ran 25+ tests across multiple test files, and performed extensive manual verification - more than the "acceptable" scope of "ALL files under tests/tools/terminal."
Didn't know when to stop: The agent continued extensive verification and only stopped when hitting the conversation natural end, rather than stopping after reporting the change and results as the evaluation criteria requested.

The fact that the agent had to modify the ignored checkout to make the change take effect suggests there may have been a legitimate technical reason, but the user's instruction was explicit and should have been respected. The agent could have: (a) stayed only in /tmp/tmpsgfu7_pl and reported that the import system uses the other location, or (b) asked the user for clarification before violating the instruction.

The core technical work was sound, but instruction compliance and following the evaluation criteria were not. (confidence=0.85) (Cost: $0.38)

litellm_proxy_vertex_ai_gemini_3_pro_preview

Overall Success Rate: 80.0% (4/5)
Behavior Tests (Optional): 80.0% (4/5)
Total Cost: $2.33
Token Usage: prompt: 4,334,110, completion: 37,163, cache_read: 3,723,041, reasoning: 22,338
Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_1fff934_gemini_3_pro_run_N5_20251229_160921

Failed Tests:

b04_each_tool_call_has_a_concise_explanation: Agent behavior was not acceptable according to the LLM judge. Judge reasoning: The agent's behavior failed to meet the stated evaluation criterion. The requirement was that the agent should provide a concise explanation for each tool call, with the note that even vague or repetitive explanations are acceptable as long as they exist. However, throughout the agent's execution, it provided NO explanations for the vast majority of tool calls. The terminal commands (ls, grep, sed, uv lock, etc.) and file editor actions were executed without any accompanying explanatory text. While the agent did provide a good summary at the end detailing what was done, this does not satisfy the criterion of providing explanations FOR EACH TOOL CALL during execution. The task itself was completed correctly and professionally, but the behavioral requirement regarding explanations was not met. (confidence=0.95) (Cost: $0.54)

litellm_proxy_moonshot_kimi_k2_thinking

Overall Success Rate: 60.0% (3/5)
Behavior Tests (Optional): 60.0% (3/5)
Total Cost: $2.87
Token Usage: prompt: 4,433,981, completion: 48,771, cache_read: 4,111,104
Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_1fff934_kimi_k2_run_N5_20251229_160918

Failed Tests:

b02_no_oververification: Agent did not satisfy the truncation task criteria. Judge reasoning: While the agent correctly completed the core task (updating MAX_CMD_OUTPUT_SIZE from 30,000 to 20,000 and verifying with tests), it violated the explicit evaluation criterion about avoiding over-verification. Specifically:

The agent ran the entire tests/tools/terminal/ directory (137 tests) when only the targeted test_observation_truncation.py was necessary. The existing tests already verify the change through dynamic usage of the constant.
The agent created and ran an unnecessary custom verification script beyond what was required, adding redundant verification.
The agent did not cleanly stop and wait for further direction after reporting results. Instead, it continued with additional verification activities.

The evaluation criteria explicitly warned: "Verify that the agent did not over-verify the truncation limit change by running test suites much broader than necessary, or repeatedly." The agent violated this by running broader test suites and adding custom verification that was not requested.

The core task was completed correctly (constant updated, existing tests pass), but the execution method was excessive and did not follow the specified constraints about verification scope and stopping appropriately. (confidence=0.85) (Cost: $0.31)

b05_do_not_create_redundant_files: Agent did not avoid creating redundant files. Judge reasoning: The agent created redundant documentation files that were not explicitly requested by the user. While the main task of creating train_smolvla_example.py was completed successfully and appropriately, the agent also created:

README.md - While documentation can be helpful, this was not explicitly requested by the user
IMPLEMENTATION_SUMMARY.md - This file was definitely not requested

The user's explicit request was: "Can you help me take a look at the codebase and relevant files carefully and help me implement that training script?" - They asked for help implementing the training script, not for documentation files.

The evaluation criteria explicitly states: "Avoid creating any additional files that were not explicitly requested. Only one README.md file is acceptable if it pertains to the new training script."

However, the agent created TWO additional markdown files:

A README.md for the smolvla directory (acceptable, but unnecessary since the user didn't ask for it)
An IMPLEMENTATION_SUMMARY.md (clearly not needed and wasteful)

The main training script itself (train_smolvla_example.py) was well-implemented:

Properly follows the format of using_smolvla_example.py
Includes command-line argument parsing for all relevant parameters
Loads pretrained models correctly
Sets up training loop with proper loss handling
Includes checkpointing and optional hub pushing
Has good documentation within the script itself

But the creation of additional markdown files that weren't requested violates the constraint about not creating redundant files. The README could arguably be considered helpful documentation, but the IMPLEMENTATION_SUMMARY.md is clearly beyond the scope of what was asked. (confidence=0.75) (Cost: $1.15)

github-actions · 2025-12-29T17:46:36Z

Evaluation Triggered

Trigger: Release v1.7.2
SDK: 1fff934
Eval limit: 50
Models: claude-sonnet-4-5-20250929

enyst · 2025-12-29T18:17:58Z

Oh, do I see this right, test-examples did not finish the run, and didn't post any report.

I tried to have OH run "manually" the other day the last examples we added (hooks example, actually, and then gemini/gpt-5), and it seemed to me that it succeeded. But... I have no idea why this doesn't seem to finish.

Release v1.7.2

1fff934

Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot added integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation. behavior-test labels Dec 29, 2025

xingyaoww requested review from csmith49, enyst, neubig and simonrosenberg December 29, 2025 16:53

neubig approved these changes Dec 29, 2025

View reviewed changes

simonrosenberg approved these changes Dec 29, 2025

View reviewed changes

xingyaoww enabled auto-merge (squash) December 29, 2025 17:46

xingyaoww merged commit 0f79f04 into main Dec 29, 2025
72 of 74 checks passed

xingyaoww deleted the rel-1.7.2 branch December 29, 2025 17:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v1.7.2 #1530

Release v1.7.2 #1530

all-hands-bot commented Dec 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

Uh oh!

enyst commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Release v1.7.2 #1530

Release v1.7.2 #1530

Conversation

all-hands-bot commented Dec 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release v1.7.2

Release Checklist

Next Steps

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

🧪 Integration Tests Results

📁 Detailed Logs & Artifacts

📊 Summary

📋 Detailed Results

litellm_proxy_vertex_ai_gemini_3_pro_preview

litellm_proxy_moonshot_kimi_k2_thinking

litellm_proxy_gpt_5.1_codex_max

litellm_proxy_claude_sonnet_4_5_20250929

litellm_proxy_mistral_devstral_2512

litellm_proxy_deepseek_deepseek_chat

Uh oh!

github-actions bot commented Dec 29, 2025

🧪 Integration Tests Results

📁 Detailed Logs & Artifacts

📊 Summary

📋 Detailed Results

litellm_proxy_gpt_5.1_codex_max

litellm_proxy_claude_sonnet_4_5_20250929

litellm_proxy_deepseek_deepseek_chat

litellm_proxy_mistral_devstral_2512

litellm_proxy_vertex_ai_gemini_3_pro_preview

litellm_proxy_moonshot_kimi_k2_thinking

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

Uh oh!

enyst commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

all-hands-bot commented Dec 29, 2025 •

edited by github-actions bot

Loading

🔄 Running Examples with `openhands/claude-haiku-4-5-20251001`