adding test reasoning levels script by ajspig · Pull Request #337 · plastic-labs/honcho

ajspig · 2026-01-22T16:47:26Z

No description provided.

fix: use PeerContextResponse in peer.ts

…o use `configuration` not `config` for consistency with API

…sdk, coderabbit nits

…or workspace, session, and peer configurations

add claude skills for migrations

- Introduced `is_deriver_flush_enabled` function to check if flush mode is active. - Updated `QueueManager` to conditionally apply batch token thresholds based on flush mode. - Enhanced `UnifiedTestExecutor` to enable flush mode via Redis. - Added `flush` parameter to test cases to facilitate testing of flush mode behavior. - Updated various test cases to utilize the new flush functionality.

…test runner - Added `schedule_dream` method to both Python and TypeScript SDKs for scheduling dream tasks. - Updated HTTP routes to include endpoint for scheduling dreams. - Enhanced test runner to utilize the new `schedule_dream` method for scheduling actions. - Updated TypeScript client to support the new scheduling functionality with appropriate parameters.

- Changed the `observer` parameter to `observers` as a list in multiple functions across the deriver module. - Updated the processing logic to handle multiple observers for representation tasks. - Adjusted related payload and queue management functions to accommodate the new observers structure. - Modified tests to reflect changes in the representation task handling and ensure proper functionality.

…s with multiple observers - Modified tests in `test_enqueue.py` to reflect changes in the queue item structure, where each message now results in a single queue item containing a list of observers. - Updated assertions to validate that the `observers` field correctly includes all relevant peers, ensuring proper functionality of the deduplication logic. - Removed redundant payload matching logic to streamline test cases and improve clarity.

…d payload observers

- Adjusted LLM and dialectic settings in `.env.template`, `config.toml.example`, and `src/config.py` to reduce maximum tool output characters and session history tokens for cost efficiency. - Implemented a new `dialectic_cost_calculator.py` script to estimate costs based on reasoning levels and model pricing. - Enhanced `DialecticAgent` to utilize minimal tools and adjusted output token settings based on reasoning level to optimize performance and reduce costs.

- Enhanced the `UnifiedTestExecutor` to include a `reasoning_level` parameter in the chat method call. - Updated the `QueryAction` model to support the new `reasoning_level` attribute, allowing for more nuanced chat interactions.

…into abigail/dev-1355

coderabbitai · 2026-01-22T16:47:38Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

dr-frmr

Nice -- A few nits and feature suggestions:

default reasoning level is low in honcho, should be consistent with this
it would be nice to have a mode where all reasoning levels are run and each response is output in a table, makes it easy to compare directly

dr-frmr and others added 30 commits January 14, 2026 17:26

chore: 3.0 honcho and 2.0 sdks changelog

cd19283

fix: use PeerContextResponse in peer.ts

chore: move docs to /v3/, build SDKs

b9b92b9

chore: code review

00144cc

feat: [WIP] migrate away from stainless in typescript sdk

eeb26a3

chore: move api from /v2/ to /v3/

eb56f8a

Merge branch 'ben/honcho-3.0-sdks-2.0' into ben/excise-stainless

a7bca21

feat: no-stainless typescript with real tests

32880d6

feat: migrate python sdk off of stainless

fd56523

feat: clean typescript sdk

9ac4a19

chore: add tests for ts http client

21c841f

fix: rewrite entire python sdk in new format, update typescript sdk t…

8f70773

…o use `configuration` not `config` for consistency with API

fix: clean up SDKs, synchronize

76d7932

chore: update sdk examples

0b9be3d

chore: update OpenAPI documentation and SDK examples to reflect changes

fcd9d99

Merge branch 'main' into ben/excise-stainless

937ae2a

fix: better test

2bac09c

fix: install deps in test runner, improve robustness of streaming in …

f3c682a

…sdk, coderabbit nits

fix: standardize around camelCase in TS SDK

9a01e84

refactor: update configuration handling in SDKs to use typed models f…

f667cab

…or workspace, session, and peer configurations

docs: clarify queue status usage and remove polling methods from SDKs

65eab9d

add claude skills for migrations

chore: fix links in docs

51ec6b3

fix: add backwards compatibility for representation work unit keys an…

1b8929d

…d payload observers

add: results

0490629

add: adoption journey

5d89254

ajspig added 3 commits January 21, 2026 15:39

Merge remote-tracking branch 'origin/ben/reasoning-rates-readjusted' …

b1020ba

…into abigail/dev-1355

add: adding script for testing reasoning levels

00451fe

fix: moving script

23aba39

dr-frmr reviewed Jan 22, 2026

View reviewed changes

Base automatically changed from ben/excise-stainless to main January 22, 2026 20:16

dr-frmr changed the title ~~adding test reasoing levels script~~ adding test reasoning levels script Jan 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding test reasoning levels script#337

adding test reasoning levels script#337
ajspig wants to merge 33 commits intomainfrom
abigail/dev-1355

ajspig commented Jan 22, 2026

Uh oh!

coderabbitai bot commented Jan 22, 2026

Review skipped

Uh oh!

dr-frmr left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ajspig commented Jan 22, 2026

Uh oh!

coderabbitai bot commented Jan 22, 2026

Review skipped

Uh oh!

dr-frmr left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants