Skip to content

Conversation

@enyst
Copy link
Owner

@enyst enyst commented Nov 7, 2025

Experimental branch to see what the agent is doing when run programmatically with a goal and successive conversations

enyst and others added 30 commits November 6, 2025 02:35
- examples/llm_profiles_tui: REPL with /model, /profile, /list, /show, /save
- tests/examples/test_llm_profiles_tui: no-network tests for commands and run_loop
- Fix minor lint/type issues; env var parsing robustness

Co-authored-by: openhands <openhands@all-hands.dev>
- Add --inline flag and default to profile references so /profile works
- Keep conversation persistence rules intact
- Document TUI quickstart and env hints in docs/llm_profiles.md

Co-authored-by: openhands <openhands@all-hands.dev>
- Implement /delete in examples/llm_profiles_tui/cli.py
- Extend unit tests to cover deletion path and tweak dummy registry
- Loosen AppContext typing to support dummy fakes in tests

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
- Pass llm_registry via pydantic context in ConversationState.create
- Remove manual traversal from resume path; keep inline-mode guard
- Verified by conversation serialization tests

Co-authored-by: openhands <openhands@all-hands.dev>
- Avoid duplicate-usage logs by not preloading all profiles into the usage map
- Switching continues to work via LLMRegistry.switch_profile (on-demand load)

Co-authored-by: openhands <openhands@all-hands.dev>
On macOS, /var is a symlink to /private/var. Using Path.resolve() returned
canonical paths under /private, causing test expectations to fail.
Use Path.absolute() to preserve the non-resolved path while walking up to
.git, which also matches typical user-facing paths.

Co-authored-by: openhands <openhands@all-hands.dev>
This file is a local artifact and should not be versioned.

Co-authored-by: openhands <openhands@all-hands.dev>
…s + Switch work and macOS path fix

Co-authored-by: openhands <openhands@all-hands.dev>
- Add examples/llm_profiles_tui/cli.py to Examples section
- Add examples/01_standalone_sdk/26_runtime_llm_switch.py reference

Co-authored-by: openhands <openhands@all-hands.dev>
- Update investigation doc to note no eager register_profiles in LocalConversation
- Clarify usage_id handling in profiles; switch_profile assigns runtime usage

Co-authored-by: openhands <openhands@all-hands.dev>
…nce is handled by LLM model serializer/validator

Keep ConversationState/Agent Pydantic-driven; no bespoke traversal.
Remote APIs remain fully inlined by default.

Co-authored-by: openhands <openhands@all-hands.dev>
…r approach for profile refs

Replace references to compact_llm_profiles/resolve_llm_profiles with the implemented LLM model serializer/validator + context flow.

Co-authored-by: openhands <openhands@all-hands.dev>
- Continue accumulating metrics under the same usage_id
- Reflect current implementation via ConversationStats.register_llm restoring metrics

Co-authored-by: openhands <openhands@all-hands.dev>
- In examples/llm_profiles_tui/cli.py, use OPENHANDS_LLM_PROFILE or LLM_PROFILE_NAME when --profile is not provided
- Add unit test to verify main() passes the env-derived profile to build_conversation without entering REPL

Co-authored-by: openhands <openhands@all-hands.dev>
- Mention OPENHANDS_LLM_PROFILE and LLM_PROFILE_NAME can be used when --profile is not provided

Co-authored-by: openhands <openhands@all-hands.dev>
- Investigated docs vs implementation and verified registry usage_id handling
- TUI/CLI for LLM Profiles present under examples/llm_profiles_tui/ with /model and /profile
- Added and verified unit tests for registry, conversation switching, and TUI
- Ran tests under uv; core LLM profile switch tests are green
- Filed follow-up: ExecuteBash test_complex_commands no-change-timeout flake (agent-sdk-37)

Co-authored-by: openhands <openhands@all-hands.dev>
- Load profile JSON lacking usage_id → LLM defaults to usage_id='default'
- switch_profile assigns runtime usage_id for active slot and continues

Add pyright suppression for this test file to avoid local import warnings during pre-commit.

Co-authored-by: openhands <openhands@all-hands.dev>
Rely on LLM.model_validate to expand profile references via registry context.
This aligns implementation with docs and reduces dead code in ConversationState.

Co-authored-by: openhands <openhands@all-hands.dev>
…tialized

Guard access to tools_map in close() so conversation cleanup in atexit does not raise if a custom agent skipped initialization or during late interpreter shutdown.

Add unit test to ensure close handles uninitialized agent gracefully.

Co-authored-by: openhands <openhands@all-hands.dev>
- Mention that usage_id defaults to 'default' if omitted
- Note that LLMRegistry.switch_profile assigns the runtime slot usage_id at switch time

Co-authored-by: openhands <openhands@all-hands.dev>
…g in interactive shells

- Add `set +H` during terminal init for both SubprocessTerminal and TmuxTerminal
- Fixes intermittent failures in test_bash_session::test_complex_commands (exit_code -1)

Co-authored-by: openhands <openhands@all-hands.dev>
- Lazy import FastMCP to avoid bringing in browser_use.mcp at server startup
- Move MCP config validation import into validator only

This resolves circular import causing wsproto tests to fail.

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
enyst and others added 15 commits November 6, 2025 23:49
Ensure demo starts with usage_id='agent' instead of schema default 'default' so runtime profile switching uses the expected slot name. No behavior change to tests.

Co-authored-by: openhands <openhands@all-hands.dev>
Clarify in LLM Profiles guide that the example TUI starts with usage_id='agent' when not explicitly set via env, so /profile switches the expected slot.

Co-authored-by: openhands <openhands@all-hands.dev>
- Exported .beads/issues.jsonl
- Closed agent-sdk-57 and agent-sdk-73 as non-reproducible after full green run

Co-authored-by: openhands <openhands@all-hands.dev>
- Aligns TUI behavior with docs so the main agent slot uses 'agent' even if the profile omits usage_id or uses 'default'.
- Adds a unit test asserting the defaulting behavior for initial profiles.
Co-authored-by: openhands <openhands@all-hands.dev>
- Recorded validation and docs alignment tasks (85–87)

Co-authored-by: openhands <openhands@all-hands.dev>
Add run_loop test that simulates inline-mode rejection for /profile
so users see a clear error when OPENHANDS_INLINE_CONVERSATIONS=true.

Co-authored-by: openhands <openhands@all-hands.dev>
…ted files

These files are local test run artifacts and should not be versioned.

Co-authored-by: openhands <openhands@all-hands.dev>
Add tests_exit_code*.txt to .gitignore so CI artifacts don't pollute the repo.

Co-authored-by: openhands <openhands@all-hands.dev>
…switch in run_loop

- Ensure /model validates presence of model field
- Verify run_loop prints a clear error when /profile targets missing profile

Co-authored-by: openhands <openhands@all-hands.dev>
- Move imports to top of file per E402

Co-authored-by: openhands <openhands@all-hands.dev>
- Created and closed issues for TUI demo (/model, /profile), registry usage_id decoupling, docs alignment, and full test run
- Exported .beads/issues.jsonl for git tracking

Co-authored-by: openhands <openhands@all-hands.dev>
…eric design (--llm)

Also mention LLM_PROFILE_NAME env fallback for the demo.

Co-authored-by: openhands <openhands@all-hands.dev>
…tests

- Closed LLM Profiles TUI e2e demo task
- Closed LLMRegistry usage_id loading/switch semantics task
- Closed docs review/update task

Co-authored-by: openhands <openhands@all-hands.dev>
@coderabbitai
Copy link

coderabbitai bot commented Nov 7, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch agent-sdk-26-llm-switch-ralph

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @enyst, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a significant enhancement by introducing dynamic LLM profile management and runtime switching capabilities. The primary goal is to allow users to easily define and switch between different LLM configurations during an ongoing conversation, facilitating experimentation and adaptability. This is showcased through a new interactive TUI example, alongside underlying architectural changes for robust profile persistence, dependency optimization, and improved terminal interaction.

Highlights

  • LLM Profile Management TUI: Introduced a new interactive Text User Interface (TUI) located at examples/llm_profiles_tui/cli.py for creating, managing, and switching LLM profiles within a conversation. This TUI supports commands like /model, /profile, /list, /show, /save, /edit, and /delete.
  • Runtime LLM Profile Switching: Implemented the core functionality for switching LLM profiles at runtime within a conversation. This includes updates to LLMRegistry to handle profile loading and assignment of usage_id during a switch, and modifications to ConversationState to orchestrate the swap, ensuring persistence and proper handling of secrets.
  • Lazy Loading of Dependencies: Optimized dependency loading for MCP (Multi-Agent Communication Protocol) related modules (FastMCP) by implementing lazy imports. This avoids loading heavy dependencies unless MCP functionality is explicitly configured or used, improving startup performance.
  • Terminal Robustness Improvements: Addressed issues with terminal prompt detection in subprocess_terminal.py and tmux_terminal.py by adding set +H to disable history expansion, which was causing intermittent test failures with complex commands.
  • Conversation Cleanup Robustness: Enhanced the LocalConversation.close method to be more robust, preventing RuntimeError when dealing with partially initialized or uninitialized agents during cleanup, especially in late interpreter shutdown scenarios.
  • Documentation and Testing: Updated documentation (docs/llm_profiles.md, docs/llm_runtime_switch_investigation.md, docs/llm_runtime_switch_plan.md) to reflect the new LLM profile and switching features. Comprehensive unit tests were added for the TUI, LLM registry, and conversation state to ensure correct behavior and persistence.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a substantial experimental pull request that introduces runtime LLM profile switching, a significant new capability. The changes are comprehensive, including a new TUI example, extensive documentation updates, and core SDK modifications to support the feature. The refactoring to enable lazy loading of MCP dependencies and the improvements to the robustness of the conversation cleanup logic are excellent. My review focuses on a few areas for improvement, primarily concerning project structure and exception handling.

@openhands-ai
Copy link

openhands-ai bot commented Nov 9, 2025

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Check Documented Examples

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #17 at branch `agent-sdk-26-llm-switch-ralph`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@enyst
Copy link
Owner Author

enyst commented Nov 10, 2025

@OpenHands Do a /codereview-roasted on this PR diff.

Respond to the user always in markdown, and use headings, bold, for readability.

@openhands-ai
Copy link

openhands-ai bot commented Nov 10, 2025

I'm on it! enyst can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Nov 10, 2025

Summary of /codereview-roasted on PR #17

Scope reviewed

  • Compared branch against main: adds LLM profile persistence, runtime profile switching, a TUI demo, and comprehensive tests.
  • Key files: LLM/LLMRegistry, ConversationState/LocalConversation, examples TUI, tests, and persistence settings.

Overall Verdict

🔴 Needs rework — Strong feature direction, but a few critical architectural and maintainability issues must be fixed before merge.

What’s Good

  • Clean profile persistence using JSON under ~/.openhands/llm-profiles with safe IDs and secret redaction.
  • Serializer/validator pattern with Pydantic context is well-designed and extensible.
  • Useful, realistic TUI for creating/switching profiles; tests avoid network calls and cover core mechanics.
  • Sensible carry-over of secrets on runtime switch.

Critical Issues (must fix)

  1. Top-level packaging hack

    • File: openhands/__init__.py
    • Problem: sys.path and __path__ manipulation to stitch subpackages. Violates repo rules (“Avoid hacky trick like sys.path.insert”) and breaks namespace/package semantics.
    • Fix: Remove this file entirely. Rely on uv workspace + PEP 420 namespace packages.
  2. Deprecated API in examples

    • File: examples/01_standalone_sdk/26_runtime_llm_switch.py
    • Problem: Uses service_id despite it being deprecated in code.
    • Fix: Replace with usage_id everywhere in examples/docs.
  3. Fragile test teardown using private attribute

    • File: tests/agent_server/test_agent_server_wsproto.py
    • Problem: Uses process._parent_pid (private). Risky across Python/platforms.
    • Fix: Adopt a robust teardown pattern (e.g., function-scoped process, multiprocessing.parent_process(), or safer liveness checks).
  4. Policy enforcement duplication

    • Files: conversation/state.py and llm/llm.py
    • Problem: Inline/profile-reference mode is enforced in both places (base-state scan and LLM validator).
    • Fix: Make the LLM validator the single source of truth; remove state-level scanning to prevent drift and mixed errors.
  5. Type checking workaround

    • File: agent/base.py
    • Problem: Inline import guarded with # type: ignore.
    • Fix: Use if TYPE_CHECKING for typing and a runtime import without type: ignore, or a minimal protocol for typing.

Improvement Opportunities (should fix)

  • Clarify/document what carries over on switch (currently secrets only). Consider whether non-secret runtime params (e.g., headers) should or should not carry; make it explicit.
  • Add logging or an explicit event around LLM switch and previous status for easier debugging.
  • In TUI, clearly document the global env side-effect (OPENHANDS_INLINE_CONVERSATIONS)—you already hint at this in help; good to keep explicit.
  • Extract profile-ref expansion logic from _coerce_inputs into a small helper for readability.
  • Add a short comment in persistence/settings.py pointing to where serializer/validator rely on INLINE_CONTEXT_KEY.

Minor Style Notes

  • In examples/25_llm_profiles.py, either set OPENHANDS_INLINE_CONVERSATIONS=false explicitly or explain that profile refs won’t appear in base_state.json when inlining is on.
  • Keep # noqa: BLE001 localized to tests.
  • MCP lazy exports in sdk/__init__.py are fine; add a one-liner noting the lazy pattern for future maintainers.

Final Recommendation

  • Address the five critical items above, then merge. The core architecture for profiles and switching is sound—don’t undercut it with packaging hacks or dupe logic.

View full conversation

Repository owner deleted a comment from openhands-ai bot Dec 1, 2025
@enyst enyst force-pushed the agent-sdk-26-llm-switch branch 2 times, most recently from 6eb0d14 to dba84fb Compare December 30, 2025 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants