Skip to content

adding test reasoning levels script#337

Draft
ajspig wants to merge 33 commits intomainfrom
abigail/dev-1355
Draft

adding test reasoning levels script#337
ajspig wants to merge 33 commits intomainfrom
abigail/dev-1355

Conversation

@ajspig
Copy link
Contributor

@ajspig ajspig commented Jan 22, 2026

No description provided.

dr-frmr and others added 30 commits January 14, 2026 17:26
fix: use PeerContextResponse in peer.ts
…o use `configuration` not `config` for consistency with API
…or workspace, session, and peer configurations
- Introduced `is_deriver_flush_enabled` function to check if flush mode is active.
- Updated `QueueManager` to conditionally apply batch token thresholds based on flush mode.
- Enhanced `UnifiedTestExecutor` to enable flush mode via Redis.
- Added `flush` parameter to test cases to facilitate testing of flush mode behavior.
- Updated various test cases to utilize the new flush functionality.
…test runner

- Added `schedule_dream` method to both Python and TypeScript SDKs for scheduling dream tasks.
- Updated HTTP routes to include endpoint for scheduling dreams.
- Enhanced test runner to utilize the new `schedule_dream` method for scheduling actions.
- Updated TypeScript client to support the new scheduling functionality with appropriate parameters.
- Changed the `observer` parameter to `observers` as a list in multiple functions across the deriver module.
- Updated the processing logic to handle multiple observers for representation tasks.
- Adjusted related payload and queue management functions to accommodate the new observers structure.
- Modified tests to reflect changes in the representation task handling and ensure proper functionality.
…s with multiple observers

- Modified tests in `test_enqueue.py` to reflect changes in the queue item structure, where each message now results in a single queue item containing a list of observers.
- Updated assertions to validate that the `observers` field correctly includes all relevant peers, ensuring proper functionality of the deduplication logic.
- Removed redundant payload matching logic to streamline test cases and improve clarity.
- Adjusted LLM and dialectic settings in `.env.template`, `config.toml.example`, and `src/config.py` to reduce maximum tool output characters and session history tokens for cost efficiency.
- Implemented a new `dialectic_cost_calculator.py` script to estimate costs based on reasoning levels and model pricing.
- Enhanced `DialecticAgent` to utilize minimal tools and adjusted output token settings based on reasoning level to optimize performance and reduce costs.
- Enhanced the `UnifiedTestExecutor` to include a `reasoning_level` parameter in the chat method call.
- Updated the `QueryAction` model to support the new `reasoning_level` attribute, allowing for more nuanced chat interactions.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 22, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator

@dr-frmr dr-frmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice -- A few nits and feature suggestions:

  • default reasoning level is low in honcho, should be consistent with this
  • it would be nice to have a mode where all reasoning levels are run and each response is output in a table, makes it easy to compare directly

Base automatically changed from ben/excise-stainless to main January 22, 2026 20:16
@dr-frmr dr-frmr changed the title adding test reasoing levels script adding test reasoning levels script Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants