Skip to content

Add thinking prefix and tool response for empty assistant messages#13

Draft
AlexCuadron wants to merge 112 commits intomainfrom
openhands-workspace-af5oj22b
Draft

Add thinking prefix and tool response for empty assistant messages#13
AlexCuadron wants to merge 112 commits intomainfrom
openhands-workspace-af5oj22b

Conversation

@AlexCuadron
Copy link
Owner

This PR adds a feature to the LLM class that automatically adds a thinking prefix and tool response when the first assistant message is empty. This makes the model believe that certain Python libraries (sympy, numpy, scipy, matplotlib) are already installed and available for use.

Changes

  • Modified llm.py to check for empty assistant messages and insert the thinking prefix and tool response
  • Added documentation in README_thinking_prefix.md
  • Added a test script to verify the functionality

Testing

The changes have been tested with a mock LLM completion function to verify that the thinking prefix and tool response are correctly added.

openhands-agent and others added 30 commits February 25, 2025 04:45
- Added update_llm_config_for_completions_logging to imports
- Modified get_config to accept instance parameter
- Updated llm_config to enable completions logging
- Updated process_instance to pass instance to get_config

This change makes aider_bench save llm_completions in the same way as swe_bench,
with completions being saved in {eval_output_dir}/llm_completions/{instance_id}/
…tions-fork

feat: Enable llm_completions logging in aider_bench
Add polyglot benchmark implementation
- Added update_llm_config_for_completions_logging to imports
- Modified get_config to accept instance parameter
- Updated llm_config to enable completions logging
- Updated process_instance to pass instance to get_config

This change makes aider_bench save llm_completions in the same way as swe_bench,
with completions being saved in {eval_output_dir}/llm_completions/{instance_id}/
Merge AIME2024 benchmark into main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants