Skip to content

Conversation

@enyst
Copy link
Owner

@enyst enyst commented Sep 12, 2025

Fixes OpenHands#172.

Summary

  • Preserve LiteLLM typed thinking_blocks and provider_specific_fields end-to-end (Message ⇄ ActionEvent ⇄ combine ⇄ non-native FC).
  • Maintain backward compatibility; fields are optional and omitted if None.
  • Tests cover mapping from LiteLLM, multi-action combine, and non-native FC path.
  • Linting and Pyright clean.

Notes

  • Full suite passes except 4 pre-existing unrelated tool tests (execute_bash complex commands; str_replace_editor memory).

Summary by CodeRabbit

  • New Features

    • Preserves and exposes provider-specific fields, thinking blocks, and reasoning content across messages, multi-action responses, and non-native function-calling flows for improved downstream rendering.
    • Ensures the primary tool action carries these provider-specific metadata for accurate display.
  • Tests

    • Added tests validating propagation of provider-specific fields and thinking blocks through conversions, serialization, aggregation, and non-native function-call handling.
  • Documentation

    • Updated README and examples (including an async example and TaskTrackerTool) to reflect new SDK layout and tooling.

@coderabbitai
Copy link

coderabbitai bot commented Sep 12, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds provider-specific thinking support across the LLM pipeline: new Message fields (thinking_blocks, provider_specific_fields), propagation through ActionEvent creation (first event only), combination of batched ActionEvents, non-native function-calling post-processing, serialization/deserialization, and tests validating preservation across conversions and flows.

Changes

Cohort / File(s) Summary of Changes
Agent propagation (first ActionEvent)
openhands/sdk/agent/agent.py
In step(), attach provider_specific_fields and thinking_blocks from the LLM message to the first generated ActionEvent only; existing thought_content distribution unchanged.
ActionEvent batching -> Message
openhands/sdk/event/base.py
_combine_action_events now includes reasoning_content, thinking_blocks, and provider_specific_fields (sourced from events[0]) in the aggregated Message.
ActionEvent schema & conversion
openhands/sdk/event/llm_convertible.py
Adds public fields thinking_blocks and provider_specific_fields to ActionEvent; to_llm_message forwards them into Message.
Message model & (de)serialization
openhands/sdk/llm/message.py
Adds public fields thinking_blocks and provider_specific_fields; to_llm_dict emits them when present; from_litellm_message reads them from LiteLLM messages.
Non-native FC post-processing
openhands/sdk/llm/mixins/non_native_fc.py
post_response_prompt_mock now also propagates thinking_blocks (in addition to reasoning_content and provider_specific_fields) when converting messages.
Tests: provider_specific_fields
tests/sdk/llm/test_provider_specific_fields.py
New tests ensuring provider_specific_fields are preserved through from_litellm_message, to_llm_dict, and events_to_messages aggregation.
Tests: thinking_blocks
tests/sdk/llm/test_thinking_blocks.py
New tests verifying thinking_blocks are preserved via from_litellm_message, to_llm_dict, events_to_messages, and non-native function-calling flows.
Diff util signature
openhands/tools/str_replace_editor/utils/diff.py
get_edit_groups now accepts `old_content: str
Tests: diff typing
tests/tools/str_replace_editor/test_visualize_diff.py
Removed # type: ignore occurrences where get_edit_groups(None, ...) is called so tests exercise runtime behavior unchanged.
Repository docs & layout
README.md, examples, tools, tests/*
README updated to include new example 11_async.py, reorganized SDK layout (io/, llm/, mcp/, tool/, utils/), added TaskTrackerTool and examples adjustments; test/fixtures updates described.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Provider as LLM Provider
  participant LiteLLM as LiteLLM
  participant OHMsg as OpenHands Message
  participant Agent as Agent
  participant AE as ActionEvent(s)
  participant Comb as Event Combiner
  participant Serializer as to_llm_dict

  Provider->>LiteLLM: ChatCompletion response (includes thinking_blocks, provider_specific_fields)
  LiteLLM-->>OHMsg: Message.from_litellm_message()
  note right of OHMsg: Message contains content, reasoning_content,\nthinking_blocks, provider_specific_fields

  OHMsg->>Agent: step()
  Agent->>AE: create ActionEvent(s)
  note right of AE: First ActionEvent receives thinking_blocks\nand provider_specific_fields

  AE->>Comb: events_to_messages()
  Comb-->>OHMsg: Combined Message (fields from first event)

  OHMsg->>Serializer: to_llm_dict()
  Serializer-->>Provider: Dict includes thinking_blocks,\nprovider_specific_fields (if present)
Loading
sequenceDiagram
  autonumber
  participant Orig as Original Message
  participant NonNative as NonNativeToolCallingMixin
  participant Final as Final Message

  Orig->>NonNative: post_response_prompt_mock(orig_msg)
  NonNative-->>Final: produce message copying\nreasoning_content, thinking_blocks, provider_specific_fields
  note right of Final: Fields preserved for downstream use
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I nibble thoughts in tidy blocks,
Little hops through provider docks.
The first hop keeps the secret keys—
Then batches bundle all with ease.
I thump my foot: "Fields safe and sound!" 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning The diff contains several changes that appear unrelated to Issue OpenHands#172 and broaden the PR scope, including README and examples updates, a reorganization of sdk modules (removal of config/ and new io/, llm/, mcp/ modules), addition of a TaskTrackerTool and tools/task_tracker, and a signature change in openhands/tools/str_replace_editor/utils/diff.py with corresponding test edits; these refactors and new tooling are not required to implement thinking_blocks support and increase review surface and risk. Please split the unrelated documentation/refactor/tool additions into separate PR(s) or explicitly justify and document their inclusion here and ensure CI covers them; otherwise revert those unrelated changes so this PR focuses solely on thinking_blocks/provider_specific_fields implementation.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "sdk: preserve Anthropic/LiteLLM thinking blocks across tool use; add tests" succinctly and accurately summarizes the primary change — end-to-end preservation of Anthropic/LiteLLM thinking_blocks and provider_specific_fields — and notes the addition of tests; it is specific, on-topic, and readable.
Linked Issues Check ✅ Passed This PR addresses Issue [OpenHands#172] by adding optional thinking_blocks and provider_specific_fields to Message and ActionEvent, updating LiteLLM↔Message mapping (from_litellm_message / to_llm_dict), propagating these fields in the multi-action combine path and in the agent step that builds ActionEvents, and adding tests that exercise mapping, multi-action combine, and the non-native function-calling flow, thereby satisfying the linked issue's coding objectives while preserving backward compatibility.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sonnet-thinking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @enyst, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the OpenHands Agent SDK by improving its ability to manage and persist complex LLM interactions, particularly by preserving detailed thinking processes from models like Anthropic. It introduces a robust conversation persistence mechanism, allowing users to save and resume agent sessions seamlessly. Furthermore, the changes standardize tool integration and enhance the visual clarity of agent activities, contributing to a more reliable and developer-friendly experience.

Highlights

  • Enhanced LLM Interaction: Improved handling and preservation of Anthropic/LiteLLM thinking blocks and provider-specific fields throughout the SDK, ensuring end-to-end data integrity for complex LLM responses.
  • Conversation Persistence: Introduced robust conversation state persistence using a new FileStore abstraction, allowing conversations to be saved and resumed seamlessly across sessions.
  • Tooling Refinements: Standardized tool initialization with a new .create() method across various tools and improved tool serialization/deserialization using Pydantic's discriminated unions for better type safety and extensibility.
  • Improved Visualization: Added Rich text visualization capabilities for various event types and tool actions/observations, enhancing clarity and readability of agent activities in conversation logs.
  • Documentation & Examples Overhaul: Restructured documentation and updated examples to reflect new SDK patterns and features, including new persistence and asynchronous usage examples.
  • Codebase Modernization: Significant refactoring across agent, LLM, and tool modules for better modularity, type safety, and maintainability, including new utilities for async operations and Pydantic diffing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

github-actions bot commented Sep 12, 2025

Coverage

Coverage Report
FileStmtsMissCoverMissing
openhands
   __init__.py10100% 
openhands/sdk
   __init__.py15286%23–24
   logger.py732171%33, 57, 64–67, 69–71, 124, 130–132, 135–136, 142–144, 151, 156–157
openhands/sdk/agent
   __init__.py30100% 
   agent.py1663380%67, 74, 81, 85, 102, 119–121, 174–175, 177–179, 181–183, 218, 232, 254, 292, 326–328, 332–334, 341–342, 346, 350–351, 381, 388
   base.py49393%46, 67, 87
openhands/sdk/context
   __init__.py40100% 
   agent_context.py57296%146, 152
   manager.py330%1, 4–5
   view.py48687%47–51, 53
openhands/sdk/context/condenser
   __init__.py30100% 
   condenser.py17382%68–69, 73
   no_op_condenser.py60100% 
openhands/sdk/context/microagents
   __init__.py40100% 
   exceptions.py50100% 
   microagent.py1432582%130, 133–136, 218–221, 229, 251–252, 257–258, 260, 264, 271–273, 281–283, 337, 339–340
   types.py210100% 
openhands/sdk/context/prompts
   __init__.py20100% 
   prompt.py30583%12, 15, 24, 44–45
openhands/sdk/conversation
   __init__.py60100% 
   conversation.py1071189%112, 120–122, 126–127, 183, 246–247, 255–256
   event_store.py101892%50–51, 60, 67, 72–73, 129, 142
   persistence_const.py50100% 
   serialization_diff.py00100% 
   state.py92792%115, 125, 127, 148, 184–186
   types.py30100% 
   visualizer.py91594%88, 145, 167, 210, 212
openhands/sdk/event
   __init__.py50100% 
   base.py742072%51, 55, 75, 79–81, 83–88, 90–91, 93–95, 97–98, 100
   condenser.py25580%29, 33–35, 49
   llm_convertible.py1831790%41, 54, 62–63, 68–69, 260, 296–297, 302, 310, 351–352, 357, 390–391, 396
   types.py30100% 
   user_action.py12191%21
   utils.py120100% 
openhands/sdk/io
   __init__.py40100% 
   base.py14471%7, 11, 15, 19
   local.py491765%17, 23, 33–34, 50–62
   memory.py44490%16, 20, 54–55
openhands/sdk/llm
   __init__.py60100% 
   exceptions.py360100% 
   llm.py38611669%224, 229, 242–244, 248–249, 281, 340, 346–347, 442, 455–456, 461–462, 464–465, 468–470, 475–477, 481–483, 504–507, 512–517, 521–522, 531–533, 536–537, 561, 567–568, 614, 663, 669, 672, 681–682, 691, 698, 701, 705–707, 711, 713–718, 720–737, 740–744, 746–747, 753–762, 766–777, 790, 804, 809
   llm_registry.py380100% 
   message.py118595%117, 120, 143, 253–254
   metadata.py150100% 
openhands/sdk/llm/mixins
   fn_call_converter.py34310170%74, 343, 345, 349, 367, 369, 375, 381, 383, 422, 424, 426, 428, 433–434, 518–520, 522, 524, 545–547, 553, 575, 601–602, 610–613, 615, 617, 639, 648, 656, 701–704, 708–711, 723, 727, 738, 748, 797–798, 800, 819–821, 823–826, 829, 833, 844–845, 859, 867, 870–871, 876, 905–908, 912–913, 918–919, 924, 973–974, 980, 994, 1006, 1008–1009, 1012–1014, 1016–1017, 1023–1025, 1027–1028, 1030, 1032, 1036, 1038, 1043, 1045–1046, 1049
   non_native_fc.py39294%64, 75
openhands/sdk/llm/utils
   metrics.py111298%17, 117
   model_features.py400100% 
   retry_mixin.py501178%47, 50, 64, 86, 90, 94–95, 105, 110–111, 116
   telemetry.py1361588%71, 94, 99–100, 112–113, 120, 134, 199, 216, 222, 229, 232, 234, 241
openhands/sdk/mcp
   __init__.py50100% 
   client.py631280%41, 56–57, 59, 78, 82, 94–95, 105–106, 112–113
   definition.py27196%54
   tool.py401367%36–39, 43, 46, 49–52, 101–102, 107
   utils.py30486%23–24, 27, 30
openhands/sdk/tool
   __init__.py40100% 
   schema.py1341787%25–27, 29, 38, 190–192, 195–196, 198, 243–246, 266, 281
   security_prompt.py30100% 
   tool.py951089%58, 99, 169, 172–178
openhands/sdk/tool/builtins
   __init__.py40100% 
   finish.py25196%31
   think.py311358%22, 25–26, 29, 31–35, 37, 49, 55, 72
openhands/sdk/utils
   __init__.py30100% 
   async_utils.py120100% 
   discriminated_union.py63592%173, 212, 217, 224, 227
   json.py28280%1–3, 5, 7–8, 11, 14–21, 25, 28, 30–31, 34, 37–38, 40, 43, 45–48
   protocol.py30100% 
   pydantic_diff.py571573%36, 44, 50–58, 60–62, 65
   truncate.py100100% 
openhands/tools
   __init__.py9277%49–50
openhands/tools/execute_bash
   __init__.py40100% 
   constants.py90100% 
   definition.py914550%36, 39, 42–43, 45, 48–50, 52–54, 56, 104, 107–109, 112, 114–116, 118, 122–123, 126–128, 130–131, 134–137, 141–143, 148, 152–154, 157–159, 163–164, 166
   impl.py130100% 
   metadata.py50394%95–96, 100
openhands/tools/execute_bash/terminal
   __init__.py60100% 
   factory.py491177%24–25, 30, 32, 35, 37–38, 44–46, 97
   interface.py691578%43, 52, 62, 71, 76, 85, 94, 99, 145, 157, 162, 171, 180, 191, 193
   subprocess_terminal.py2365975%68, 98–99, 125, 131, 138, 145–146, 156–157, 163–164, 178, 180, 184–186, 192, 208, 217–221, 256–258, 263, 275, 289, 313, 315, 324, 345, 361, 366, 372–374, 382–383, 387–388, 390–396, 400–401, 404–405, 407–408, 410–412
   terminal_session.py178895%92, 96–98, 235, 281, 297, 317
   tmux_terminal.py782173%35, 41, 102, 113, 127, 139–146, 154–155, 157–158, 160, 162–164
openhands/tools/execute_bash/utils
   command.py81495%48, 64–66
openhands/tools/str_replace_editor
   __init__.py30100% 
   definition.py61985%86, 98, 118, 121, 124, 131, 133, 135, 137
   editor.py2281195%131, 264, 340, 350, 401–402, 641, 648–649, 663, 668
   exceptions.py220100% 
   impl.py26292%31–32
openhands/tools/str_replace_editor/utils
   __init__.py00100% 
   config.py20100% 
   constants.py50100% 
   diff.py64198%115
   encoding.py54198%81
   file_cache.py95990%44–46, 49–50, 54, 59, 151, 154
   history.py66198%79
   shell.py230100% 
openhands/tools/task_tracker
   __init__.py20100% 
   definition.py1319428%47, 50–52, 54–55, 58–59, 61, 77, 82, 84, 86–87, 90, 93–95, 97–98, 101–107, 109–111, 114, 116–119, 121, 124, 127–128, 130–131, 133–134, 136, 149–150, 153–154, 158, 160, 162–164, 170, 172–173, 178–179, 183, 192–193, 195–197, 201–202, 204–207, 209, 213–214, 216–218, 220–224, 228, 232–233, 235–236, 238, 240–244, 406, 409
openhands/tools/utils
   __init__.py00100% 
TOTAL509086982% 

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a substantial and high-quality pull request that introduces a robust persistence layer for conversations, refactors the tool creation API for consistency, and significantly improves the visualization of agent activities. The core feature of preserving provider-specific LLM data like thinking_blocks is well-implemented throughout the stack. The code is well-structured, and the addition of comprehensive tests for the new features is commendable. My review found one minor area for improvement regarding code clarity in the conversation state creation logic.

Comment on lines 122 to 131
if agent.model_dump(exclude_none=True) != resolved.model_dump(
exclude_none=True
):
from openhands.sdk.utils.pydantic_diff import pretty_pydantic_diff

raise ValueError(
"The agent provided is different from the one in persisted state. "
"Please use the same agent instance to resume the conversation.\n"
f"Diff: {pretty_pydantic_diff(agent, state.agent)}"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check for differences between the runtime agent and the reconciled agent is redundant. The agent.resolve_diff_from_deserialized() method, called on the preceding line, already performs an equivalent check internally and will raise a ValueError with a more informative diff if the configurations are incompatible (after accounting for sensitive fields like API keys). Removing this block will make the code cleaner and avoid duplicating logic.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
tests/cross/test_hello_world.py (1)

245-259: Make content extraction robust to dict vs model instances.

Model dumps turn TextContent into dicts; prefer key access over hasattr to avoid brittle str(dict) parsing.

-        if user_content:
-            # Extract text from TextContent objects
-            for content in user_content:
-                if hasattr(content, "text"):
-                    user_text += content.text.lower()
-                else:
-                    user_text += str(content).lower()
+        if user_content:
+            for content in user_content:
+                if isinstance(content, dict) and "text" in content:
+                    user_text += str(content["text"]).lower()
+                elif hasattr(content, "text"):
+                    user_text += str(content.text).lower()
+                else:
+                    user_text += str(content).lower()
openhands/tools/task_tracker/definition.py (1)

142-149: Docstring says TASKS.md but code uses TASKS.json.

Fix to prevent user confusion.

-            save_dir: Optional directory to save tasks to. If provided, tasks will be
-                     persisted to save_dir/TASKS.md
+            save_dir: Optional directory to save tasks to. If provided, tasks will be
+                     persisted to save_dir/TASKS.json
openhands/tools/str_replace_editor/editor.py (2)

111-121: Guard against empty old_str in str_replace

Empty needle matches every position; fail fast with a clear message.

         elif command == "str_replace":
             if old_str is None:
                 raise EditorToolParameterMissingError(command, "old_str")
+            if old_str == "":
+                raise EditorToolParameterInvalidError(
+                    "old_str", old_str, "`old_str` must be non-empty."
+                )
             if new_str == old_str:
                 raise EditorToolParameterInvalidError(

271-287: Shell injection risk: quote path in find commands

The formatted shell uses raw path; quote it (and avoid regex confusion). Also quote in both invocations.

+            import shlex
@@
-            _, hidden_stdout, _ = run_shell_cmd(
-                rf"find -L {path} -mindepth 1 -maxdepth 1 -name '.*'"
-            )
+            path_q = shlex.quote(str(path))
+            _, hidden_stdout, _ = run_shell_cmd(
+                f"find -L {path_q} -mindepth 1 -maxdepth 1 -name '.*'"
+            )
@@
-            _, stdout, stderr = run_shell_cmd(
-                rf"find -L {path} -maxdepth 2 -not \( -path '{path}/\.*' -o "
-                rf"-path '{path}/*/\.*' \) | sort",
+            _, stdout, stderr = run_shell_cmd(
+                f"find -L {path_q} -maxdepth 2 -not \\( -path {path_q!s}/\\.* -o "
+                f"-path {path_q!s}/*/\\.* \\) | sort",
                 truncate_notice=DIRECTORY_CONTENT_TRUNCATED_NOTICE,
             )
README.md (1)

176-214: Doc bug: incorrect Observation return type and missing import in example

agent_observation must return a list; Field not imported. Fix snippet to prevent copy/paste errors.

-from openhands.sdk.tool import ActionBase, ObservationBase, ToolExecutor
+from openhands.sdk.tool import ActionBase, ObservationBase, ToolExecutor
+from pydantic import Field
@@
 class GrepObservation(ObservationBase):
     output: str = Field(default='')
 
     @property
     def agent_observation(self) -> list[TextContent | ImageContent]:
-        return TextContent(text=self.output)
+        return [TextContent(text=self.output)]
openhands/sdk/conversation/visualizer.py (1)

228-239: Double “$” when cost==0

cost_str includes “$0.00” but you add another “$ ” prefix; show “$0.00” once.

-        # Cost
-        cost_str = f"{cost:.4f}" if cost > 0 else "$0.00"
+        # Cost
+        cost_str = f"{cost:.4f}" if cost > 0 else "0.00"
@@
-        parts.append(f"[green]$ {cost_str}[/green]")
+        parts.append(f"[green]$ {cost_str}[/green]")
openhands/sdk/agent/agent.py (1)

378-395: Call tools via tool.call() to enforce I/O coercion.

Directly invoking executor bypasses output coercion and schema guarantees in Tool.call().

-        # Execute actions!
-        if tool.executor is None:
-            raise RuntimeError(f"Tool '{tool.name}' has no executor")
-        observation: ObservationBase = tool.executor(action_event.action)
-        assert isinstance(observation, ObservationBase), (
-            f"Tool '{tool.name}' executor must return an ObservationBase"
-        )
+        # Execute with schema validation/coercion
+        observation: ObservationBase = tool.call(action_event.action)

Comment on lines 20 to 30
uses: All-Hands-AI/openhands-github-action@v1
with:
prompt: .github/prompts/update-documentation.md
repository: ${{ github.repository }}
selected-branch: main
base-url: https://app.all-hands.dev
poll: "true"
timeout-seconds: 1800
poll-interval-seconds: 30
github-token: ${{ secrets.GITHUB_TOKEN }}
openhands-api-key: ${{ secrets.OPENHANDS_API_KEY }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify action input names for openhands-github-action.

Inputs used here (prompt, selected-branch, base-url, poll, timeout-seconds, poll-interval-seconds) may not match the action’s expected inputs (the summary mentions prompt_file, main_branch, base_url, polling, polling_interval_seconds, timeout_seconds). Mismatches will be silently ignored.


🌐 Web query:

What are the valid input names for All-Hands-AI/openhands-github-action@v1?

💡 Result:

Valid inputs: prompt, llm_api_key, llm_model, log_all_events, runtime_image, openhands_image, additional_env. [1][2]


Fix invalid inputs for All-Hands-AI/openhands-github-action@v1

The action only accepts: prompt, llm_api_key, llm_model, log_all_events, runtime_image, openhands_image, additional_env. In .github/workflows/oh-update-documentation.yml.back (lines 20–30) only prompt is valid; the other keys will be ignored.

  • prompt — OK.
  • Replace openhands-api-keyllm_api_key: ${{ secrets.OPENHANDS_API_KEY }}.
  • Remove or migrate these unsupported keys: repository, selected-branch, base-url, poll, timeout-seconds, poll-interval-seconds, github-token — pass needed values via additional_env or the action's supported inputs/mechanism.
🤖 Prompt for AI Agents
In .github/workflows/oh-update-documentation.yml.back around lines 20 to 30, the
action inputs are invalid: only prompt is a supported top-level input for
All-Hands-AI/openhands-github-action@v1, so keep prompt as-is; rename
openhands-api-key to llm_api_key and set its value to ${{
secrets.OPENHANDS_API_KEY }}; remove the unsupported keys repository,
selected-branch, base-url, poll, timeout-seconds, poll-interval-seconds, and
github-token (or if needed, move their values into the action’s additional_env
input or otherwise pass them via supported mechanisms) so that only supported
inputs remain.

Comment on lines 15 to 25
def __init__(self, root: str):
if root.startswith("~"):
root = os.path.expanduser(root)
self.root = root
os.makedirs(self.root, exist_ok=True)

def get_full_path(self, path: str) -> str:
if path.startswith("/"):
path = path[1:]
return os.path.join(self.root, path)

Copy link

@coderabbitai coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Sandbox escape: normalize and enforce root to prevent path traversal

get_full_path() allows "../" to escape root (e.g., root/../../etc/passwd). Canonicalize root and resolved paths and reject escapes.

Apply:

@@
-    def __init__(self, root: str):
-        if root.startswith("~"):
-            root = os.path.expanduser(root)
-        self.root = root
-        os.makedirs(self.root, exist_ok=True)
+    def __init__(self, root: str):
+        if root.startswith("~"):
+            root = os.path.expanduser(root)
+        root = os.path.abspath(os.path.normpath(root))
+        self.root = root
+        os.makedirs(self.root, exist_ok=True)
@@
-    def get_full_path(self, path: str) -> str:
-        if path.startswith("/"):
-            path = path[1:]
-        return os.path.join(self.root, path)
+    def get_full_path(self, path: str) -> str:
+        # strip leading slash to keep relative under root
+        if path.startswith("/"):
+            path = path[1:]
+        full = os.path.abspath(os.path.normpath(os.path.join(self.root, path)))
+        # ensure sandboxing
+        if os.path.commonpath([self.root, full]) != self.root:
+            raise ValueError(f"path escapes filestore root: {path}")
+        return full
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __init__(self, root: str):
if root.startswith("~"):
root = os.path.expanduser(root)
self.root = root
os.makedirs(self.root, exist_ok=True)
def get_full_path(self, path: str) -> str:
if path.startswith("/"):
path = path[1:]
return os.path.join(self.root, path)
def __init__(self, root: str):
if root.startswith("~"):
root = os.path.expanduser(root)
root = os.path.abspath(os.path.normpath(root))
self.root = root
os.makedirs(self.root, exist_ok=True)
def get_full_path(self, path: str) -> str:
# strip leading slash to keep relative under root
if path.startswith("/"):
path = path[1:]
full = os.path.abspath(os.path.normpath(os.path.join(self.root, path)))
# ensure sandboxing
if os.path.commonpath([self.root, full]) != self.root:
raise ValueError(f"path escapes filestore root: {path}")
return full
🤖 Prompt for AI Agents
In openhands/sdk/io/local.py around lines 15 to 25, get_full_path currently
permits path traversal via "../" which can escape the configured root; normalize
and enforce the root to prevent this by expanding and canonicalizing the root in
__init__ (expanduser + os.path.abspath/os.path.realpath) and store that
canonical root (create the directory if needed), then in get_full_path join the
incoming path to the stored root, canonicalize the resulting path with
os.path.abspath/os.path.realpath, and validate it stays inside the canonical
root (e.g., resolved_path == root or resolved_path.startswith(root + os.sep));
if it escapes, raise a ValueError (or custom exception) rejecting the path.
Ensure leading slashes are stripped before join and keep behavior consistent for
absolute/relative inputs.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist Read the code and where the local filestore is used. Understand this issue. WDYT?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a critical security issue. The suggested changes from coderabbitai[bot] effectively address the path traversal vulnerability by properly normalizing and validating file paths against the defined root. Using os.path.abspath, os.path.normpath, and os.path.commonpath is a robust approach to prevent sandbox escapes.

I strongly recommend adopting the suggested code to enhance the security of the local file store.

Suggested change
def __init__(self, root: str):
if root.startswith("~"):
root = os.path.expanduser(root)
self.root = root
os.makedirs(self.root, exist_ok=True)
def get_full_path(self, path: str) -> str:
if path.startswith("/"):
path = path[1:]
return os.path.join(self.root, path)
def __init__(self, root: str):
if root.startswith("~"):
root = os.path.expanduser(root)
root = os.path.abspath(os.path.normpath(root))
self.root = root
os.makedirs(self.root, exist_ok=True)
def get_full_path(self, path: str) -> str:
# strip leading slash to keep relative under root
if path.startswith("/"):
path = path[1:]
full = os.path.abspath(os.path.normpath(os.path.join(self.root, path)))
# ensure sandboxing
if os.path.commonpath([self.root, full]) != self.root:
raise ValueError(f"path escapes filestore root: {path}")
return full

Comment on lines 221 to 223
with open(tasks_file, "r", encoding="utf-8") as f:
self._task_list = json.load(f)
self._task_list = [TaskItem.model_validate(d) for d in json.load(f)]
except (OSError, json.JSONDecodeError) as e:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

JSON load path: good switch to model_validate. Add TypeError catch.

Defensive against non-list JSON or wrong shapes.

-        except (OSError, json.JSONDecodeError) as e:
+        except (OSError, json.JSONDecodeError, TypeError) as e:
             logger.warning(
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with open(tasks_file, "r", encoding="utf-8") as f:
self._task_list = json.load(f)
self._task_list = [TaskItem.model_validate(d) for d in json.load(f)]
except (OSError, json.JSONDecodeError) as e:
with open(tasks_file, "r", encoding="utf-8") as f:
self._task_list = [TaskItem.model_validate(d) for d in json.load(f)]
except (OSError, json.JSONDecodeError, TypeError) as e:
🤖 Prompt for AI Agents
In openhands/tools/task_tracker/definition.py around lines 221 to 223, the JSON
loading block assumes json.load returns an iterable of dicts and only catches
OSError and JSONDecodeError; update the except clause to also catch TypeError
and handle cases where the loaded JSON is not a list by validating the loaded
value before iterating (e.g., check isinstance(data, list) and raise or log a
clear error), and then keep using TaskItem.model_validate for each element;
ensure the new exception handling logs or raises with a descriptive message
including the problematic data type.

enyst and others added 2 commits September 13, 2025 02:54
…ic extended thinking blocks; preserve on ActionEvent batches and non-native FC conversions

- Message: support provider_specific_fields field; include in to_llm_dict and from_litellm_message
- ActionEvent: carry provider_specific_fields; to_llm_message exports it
- events_to_messages: preserve provider_specific_fields from first ActionEvent in batch
- Agent.step: copy provider_specific_fields from LLM message to first ActionEvent only (avoid double-processing)
- NonNativeToolCallingMixin: preserve reasoning_content and provider_specific_fields after conversion

Co-authored-by: openhands <openhands@all-hands.dev>

- Message: support provider_specific_fields field; include in to_llm_dict and from_litellm_message
- ActionEvent: carry provider_specific_fields; to_llm_message exports it
- events_to_messages: preserve provider_specific_fields from first ActionEvent in batch
- Agent.step: copy provider_specific_fields from LLM message to first ActionEvent only (avoid double-processing)
- NonNativeToolCallingMixin: preserve reasoning_content and provider_specific_fields after conversion

Co-authored-by: openhands <openhands@all-hands.dev>

- Message: add thinking_blocks, provider_specific_fields; round-trip in to_llm_dict/from_litellm_message
- Agent: attach provider_specific_fields and thinking_blocks to first ActionEvent in batch
- Events: ActionEvent carries thinking_blocks, combine path preserves them
- NonNative FC: restore reasoning_content, thinking_blocks, provider_specific_fields to final assistant message
- Tests: add thinking_blocks tests; use LiteLLM typed dict keys; pyright-safe

Co-authored-by: openhands <openhands@all-hands.dev>
…ecific_fields end-to-end; tests for mapping, combine, non-native FC; lint+types clean

- Message: add thinking_blocks, provider_specific_fields; round-trip in to_llm_dict/from_litellm_message
- Agent: attach provider_specific_fields and thinking_blocks to first ActionEvent in batch
- Events: ActionEvent carries thinking_blocks, combine path preserves them
- NonNative FC: restore reasoning_content, thinking_blocks, provider_specific_fields to final assistant message
- Tests: add thinking_blocks tests; use LiteLLM typed dict keys; pyright-safe

Co-authored-by: openhands <openhands@all-hands.dev>

- Message: support provider_specific_fields field; include in to_llm_dict and from_litellm_message
- ActionEvent: carry provider_specific_fields; to_llm_message exports it
- events_to_messages: preserve provider_specific_fields from first ActionEvent in batch
- Agent.step: copy provider_specific_fields from LLM message to first ActionEvent only (avoid double-processing)
- NonNativeToolCallingMixin: preserve reasoning_content and provider_specific_fields after conversion

Co-authored-by: openhands <openhands@all-hands.dev>

- Message: add thinking_blocks, provider_specific_fields; round-trip in to_llm_dict/from_litellm_message
- Agent: attach provider_specific_fields and thinking_blocks to first ActionEvent in batch
- Events: ActionEvent carries thinking_blocks, combine path preserves them
- NonNative FC: restore reasoning_content, thinking_blocks, provider_specific_fields to final assistant message
- Tests: add thinking_blocks tests; use LiteLLM typed dict keys; pyright-safe

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
openhands/sdk/event/llm_convertible.py (1)

37-54: Do not mutate tool specs during visualization.

Reiterating a prior review: the truncation loop mutates self.tools in-place. Please switch to a display-only copy to avoid corrupting event data.

🧹 Nitpick comments (4)
openhands/sdk/llm/mixins/non_native_fc.py (1)

87-95: Don’t drop empty containers; check for None explicitly.

if not val: skips valid empty list/dict/empty-string values, contrary to “omit only when None.” Use an identity check.

-        for name in (
+        for name in (
             "reasoning_content",
             "thinking_blocks",
             "provider_specific_fields",
         ):
             val = getattr(orig_msg, name, None)
-            if not val:
+            if val is None:
                 continue
             last[name] = val
openhands/sdk/event/base.py (1)

147-164: Assert multi-action invariants for reasoning fields.

You already assert empty thought on subsequent events; do the same for reasoning_content/thinking_blocks/provider_specific_fields to prevent accidental duplication.

     for e in events[1:]:
         assert len(e.thought) == 0, (
             "Expected empty thought for multi-action events after the first one"
         )
+        assert (
+            e.reasoning_content is None
+            and e.thinking_blocks is None
+            and e.provider_specific_fields is None
+        ), "Only the first ActionEvent should carry reasoning/provider-specific fields"
openhands/sdk/event/llm_convertible.py (2)

6-9: Guard imports for wider litellm compatibility.

These types may not exist on older litellm versions, causing ImportError at runtime. Either pin a minimum litellm version or add a safe fallback for typing.

-from litellm.types.llms.openai import (
-    ChatCompletionRedactedThinkingBlock,
-    ChatCompletionThinkingBlock,
-)
+from typing import TYPE_CHECKING, Any as _Any
+try:
+    from litellm.types.llms.openai import (
+        ChatCompletionRedactedThinkingBlock,
+        ChatCompletionThinkingBlock,
+    )
+except Exception:  # pragma: no cover
+    # Fallback keeps runtime working; static type-checkers still see real types if available.
+    ChatCompletionThinkingBlock = _Any  # type: ignore
+    ChatCompletionRedactedThinkingBlock = _Any  # type: ignore

83-92: Clarify field semantics; avoid overlap with thinking_blocks.

The description for provider_specific_fields mentions “Anthropic thinking blocks,” which are already modeled by thinking_blocks. Reword to “pass-through provider metadata not modeled elsewhere.”

-    provider_specific_fields: dict[str, Any] | None = Field(
-        default=None,
-        description="Provider-specific fields (e.g., Anthropic thinking blocks)",
-    )
+    provider_specific_fields: dict[str, Any] | None = Field(
+        default=None,
+        description="Provider-specific metadata not otherwise modeled (pass-through).",
+    )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9193c39 and d8eb774.

📒 Files selected for processing (7)
  • openhands/sdk/agent/agent.py (1 hunks)
  • openhands/sdk/event/base.py (1 hunks)
  • openhands/sdk/event/llm_convertible.py (3 hunks)
  • openhands/sdk/llm/message.py (4 hunks)
  • openhands/sdk/llm/mixins/non_native_fc.py (1 hunks)
  • tests/sdk/llm/test_provider_specific_fields.py (1 hunks)
  • tests/sdk/llm/test_thinking_blocks.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/sdk/llm/test_provider_specific_fields.py
  • openhands/sdk/agent/agent.py
  • tests/sdk/llm/test_thinking_blocks.py
  • openhands/sdk/llm/message.py
🧰 Additional context used
🧬 Code graph analysis (2)
openhands/sdk/event/base.py (1)
openhands/sdk/event/llm_convertible.py (1)
  • reasoning_content (247-248)
openhands/sdk/event/llm_convertible.py (1)
openhands/sdk/utils/json.py (1)
  • default (14-21)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: sdk-tests
🔇 Additional comments (3)
openhands/sdk/llm/mixins/non_native_fc.py (1)

97-98: Confirm LiteLLMMessage preserves extra fields (unable to verify locally)

Local probe failed: repo search errored and importing litellm raised "No module named 'litellm'". Verify that LiteLLMMessage.model_validate()/model_dump() preserves unknown fields (thinking_blocks, provider_specific_fields) for the litellm version pinned in this repo. If it does not, preserve those fields in a side-channel or convert them into your own Message (keeping extras) before assigning resp.choices[0].message.

Location: openhands/sdk/llm/mixins/non_native_fc.py lines 97–98

openhands/sdk/event/base.py (1)

161-164: LGTM on preserving reasoning/provider fields in combined messages.

openhands/sdk/event/llm_convertible.py (1)

154-156: Forwarding new fields looks correct.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
openhands/tools/str_replace_editor/utils/diff.py (1)

15-22: Docstring still omits params and None-semantics—document inputs.

Prior feedback requested this; the summary/Args should mention old_content, new_content, and that either being None returns [].

 def get_edit_groups(
     old_content: str | None, new_content: str | None, n_context_lines: int = 2
 ) -> list[EditGroup]:
-    """Get the edit groups showing changes between old and new content.
+    """Compute structured edit groups between two strings.
 
-    Args:
-        n_context_lines: Number of context lines to show around each change.
+    Args:
+        old_content: Original text. If None, returns an empty list.
+        new_content: Updated text. If None, returns an empty list.
+        n_context_lines: Number of context lines to show around each change.
 
     Returns:
         A list of edit groups, where each group contains before/after edits.
     """
🧹 Nitpick comments (5)
README.md (5)

81-81: Import includes TaskTrackerTool but Hello World tools list doesn't use it

Either include TaskTrackerTool in the tools list or drop it from imports to avoid confusion. Prefer including it to mirror later examples.

Apply one of the following:

Option A — use TaskTrackerTool (recommended):

-from openhands.tools import BashTool, FileEditorTool, TaskTrackerTool
+from openhands.tools import BashTool, FileEditorTool, TaskTrackerTool

and below (see lines 93-97):

 tools = [
-    BashTool.create(working_dir=cwd),
-    FileEditorTool.create(),
+    BashTool.create(working_dir=cwd),
+    FileEditorTool.create(),
+    TaskTrackerTool.create(save_dir=cwd),
 ]

Option B — remove unused import:

-from openhands.tools import BashTool, FileEditorTool, TaskTrackerTool
+from openhands.tools import BashTool, FileEditorTool

93-97: Mirror imports by adding TaskTrackerTool in Hello World (consistency with simplified example)

Add TaskTrackerTool.create(save_dir=cwd) here for consistency; otherwise drop the import above.

(See diff in prior comment.)


158-165: Tighten example import: drop unused ImageContent

ImageContent isn’t used in this snippet. Removing it reduces noise.

-from openhands.sdk import TextContent, ImageContent
+from openhands.sdk import TextContent

84-90: Parameterize model/base_url; improve key check; pin Anthropic Sonnet

File: README.md lines 84-90

  • Read LITELLM_API_KEY, LLM_MODEL, and LLM_BASE_URL from env; replace the assert with an explicit check and raise ValueError("Missing LITELLM_API_KEY. Set it to your LiteLLM proxy API key.") when absent.
  • Default the model to "anthropic/claude-3-7-sonnet-20250219" (recommended). Anthropic publishes a "-latest" alias ("claude-3-7-sonnet-latest") but pin a versioned id for production.

26-28: Examples present — add Quickstart one-liner for async example

examples/10_persistence.py and examples/11_async.py are present; examples/11_async.py defines async def main() with an entrypoint (if name == "main": asyncio.run(main())), so it can be run directly. Add a one-liner under README.md → "Installation & Quickstart", e.g. uv run python examples/11_async.py.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8eb774 and 609441d.

📒 Files selected for processing (3)
  • README.md (4 hunks)
  • openhands/tools/str_replace_editor/utils/diff.py (1 hunks)
  • tests/tools/str_replace_editor/test_visualize_diff.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/tools/str_replace_editor/test_visualize_diff.py (1)
openhands/tools/str_replace_editor/utils/diff.py (1)
  • get_edit_groups (12-64)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: sdk-tests
  • GitHub Check: tools-tests
🔇 Additional comments (7)
openhands/tools/str_replace_editor/utils/diff.py (1)

13-14: Good: Optional[str] matches runtime None-handling.

Signature now reflects actual behavior and removes the need for type-ignores.

tests/tools/str_replace_editor/test_visualize_diff.py (1)

259-266: Nice: direct None calls now type-safe and asserted.

Removing the type-ignores and asserting empty lists exercises the new signature well.

README.md (5)

34-34: Repo tree update looks good

io/ addition is reflected accurately in the structure.


36-36: MCP directory listing matches intent

mcp/ is correctly called out in the SDK layout.


44-44: Task tracker tool addition acknowledged

Listing for tools/task_tracker/ aligns with examples below.


49-49: Tests/fixtures move acknowledged

fixtures/ replaces integration tests as described; test commands later reflect this.


277-277: Docs pointer LGTM

Pointing readers to examples/ is clear and aligns with the repo shift away from architecture docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add reasoning with Anthropic models

2 participants