feat: smart routing v3, provider-locked selection, and NanoGPT endpoint policy#6
feat: smart routing v3, provider-locked selection, and NanoGPT endpoint policy#6Daltonganger wants to merge 39 commits intofeat/omos-pr2-manual-backendfrom
Conversation
* feat: restrict subagent delegation based on agent type (alvinunreal#116) Add SUBAGENT_DELEGATION_RULES to control which agents can spawn subagents: - orchestrator: can spawn all subagents (full delegation) - fixer/designer: can spawn explorer only (for research) - explorer/librarian/oracle: cannot spawn any subagents (leaf nodes) Update BackgroundTaskManager to: - Track agent type per session via agentBySessionId map - Calculate tool permissions based on parent agent's delegation rules - Apply appropriate background_task/task tool permissions when spawning Add comprehensive tests for all delegation restrictions. Fixes alvinunreal#116 * fix: enforce agent-type validation in background_task tool delegation The background_task tool accepted any agent string without checking SUBAGENT_DELEGATION_RULES, allowing agents like fixer to delegate to any agent despite being restricted to explorer only. Add isAgentAllowed() and getAllowedSubagents() methods to BackgroundTaskManager and validate the requested agent in the tool before launching. * fix: treat untracked sessions as root orchestrator in delegation checks The root orchestrator session is created by OpenCode, not by BackgroundTaskManager, so it is never registered in agentBySessionId. This caused isAgentAllowed(), getAllowedSubagents(), and calculateToolPermissions() to reject all delegation from the root session — completely blocking the primary agent from launching tasks. Default untracked sessions to 'orchestrator' instead of rejecting them. * feat: default unknown agent types to explorer-only delegation New background agent types no longer need explicit entries in SUBAGENT_DELEGATION_RULES. Agents not listed default to ['explorer'] access via a centralized getSubagentRules() helper. * fix: base tool permissions on spawned agent's own delegation rules, not parent's * test: add multi-layered delegation chain tests for orchestrator→fixer→explorer paths --------- Co-authored-by: vllm-user <vllm-user@example.com>
Install the /omos command entry, add diff-first confirm flow for omos_preferences, and handle global/project target precedence warnings. Include tests for command install, diff behavior, and precedence plus rollout gate documentation updates.
Code Review SummaryStatus: 1 Issue Found | Recommendation: Address before merge Fix these issues in Kilo Cloud Overview
Issue Details (click to expand)WARNING
Other Observations (not in diff)Issues found in unchanged code or carried over from prior review:
Files Reviewed (12 files)
|
Add a deterministic provider-combination matrix test that compares v1, v2-shadow, and v2 outputs across curated mixes plus three random mixes. Refresh the provider matrix doc with the new scenario results and scoring-mode behavior.
Add score-plan output for manual model plans, including per-agent ranked scores for v1/v2/shadow comparisons. Also align plan/apply diff hashes to target config, ensure target directories exist before writes, and update provider-coverage provenance after swap logic.
Update README and rollout docs with an English how-to for viewing model scores during manual planning, and refresh the installed /omos command template to include score-plan and diffHash-safe apply flow.
Include an English command sequence for show -> score-plan -> plan -> apply using operation-based omos_preferences calls, with target selection and diffHash-safe apply guidance.
Remove CHUTES_API_KEY env dependency from installer/config wiring and align Chutes with OpenCode auth provider flow. Switch Chutes discovery to all provider models from opencode refresh output while keeping OpenCode free-model filtering unchanged.
Accept provider/model headers with additional path segments so chutes catalog entries like chutes/vendor/model are discovered during opencode verbose parsing.
Allow manual plan model ids like provider/vendor/model and keep them in score-plan ranking. Add regressions for loader validation and /omos scoring with multi-segment Chutes model identifiers.
Use model ids after the first slash for external signal lookup so chutes/vendor/model names map correctly in v1 and v2 scoring. Add regressions for multi-segment chutes ids in dynamic and scoring engine tests.
Apply canonical model alias normalization across external signal ingestion and v1/v2 lookups (strip provider prefix, remove TEE/FP* tokens, lowercase, and normalize slash/space/hyphen variants). Also switch V2 capability scoring to one-sided bonuses, add K2.5 version preference, and update regression matrices/tests for the new ranking behavior.
Add an installer question for subscription/pay-per-API usage and persist balanceProviderUsage in config. When enabled, dynamic planning rebalances provider assignments toward even distribution using a max score-loss tolerance. Also expose the setting in omos_preferences show/plan/apply flows and update docs/tests.
Add targeted Chutes role priors that down-rank Qwen3 and prefer Kimi K2.5/Minimax M2.1 by role in both v1 and v2 scoring. Update matrix and regression tests to reflect the calibrated ranking behavior.
- Remove omos_preferences tool and /omos command flow - Add --dry-run flag for testing install without OpenCode - Add manual setup mode to choose models per agent - Exclude already-selected models from fallback choices - Limit model list display to 5 items with option to type any model ID - Support primary + 3 fallbacks configuration per agent - Add ManualAgentConfig type for storing manual selections
- Add OPENCODE_PATHS with common installation locations - Add resolveOpenCodePath() helper with caching - Update all opencode commands to use resolved path - Show found path in success message - Add helpful instructions when opencode not found in PATH
Add comprehensive path detection including: - macOS: Applications, Homebrew, Library paths - Linux: /usr/bin, snap, flatpak, nix paths - Package managers: Homebrew, Cargo, npm, yarn, pnpm - More system-wide and user-local locations
Respect user-selected primary/secondary models during dynamic planning, lock forced assignments from rebalance overrides, and update tests to match manual-plan precedence behavior. Also include robust OpenCode path resolution updates for cross-platform installs.
| @@ -0,0 +1,2 @@ | |||
| import { getConfigDir } from './paths'; | |||
There was a problem hiding this comment.
WARNING: Unused imports in stub file
This file imports getConfigDir and ConfigMergeResult but never uses them. The file has no exports, making the export * from './commands' in config-manager.ts a no-op. This appears to be an incomplete stub that should either be completed or removed.
Either export functions that use these imports, or remove the file and the re-export from config-manager.ts.
| for (const opencodePath of paths) { | ||
| try { | ||
| // Check if we can execute it | ||
| const proc = Bun.spawn([opencodePath, '--version'], { |
There was a problem hiding this comment.
WARNING: Unused variable proc
The proc variable from Bun.spawn() is never used. While the intent is to check if spawn succeeds, the variable should either be used (e.g., to check the result) or removed with a comment explaining why we don't await the process.
| const proc = Bun.spawn([opencodePath, '--version'], { | |
| // Check if spawn succeeds - don't await, just verify no throw | |
| Bun.spawn([opencodePath, '--version'], { | |
| stdout: 'pipe', | |
| stderr: 'pipe', | |
| }); |
Note: This approach may cache an invalid path since Bun.spawn() might not throw synchronously for non-existent executables on all platforms.
- add command and mode for updating assignments without reinstalling plugins/skills - keep full provider model lists available for primary/support choices - improve large-list selection with expansion option - refine dynamic planner provider candidate narrowing and pinned model handling - update matrix/planner tests and remove residual internal placeholder wording
Clarify quick/manual setup descriptions, standardize provider prompts, and improve models-only messaging.
Avoid spawning processes in resolveOpenCodePath by selecting only existing filesystem paths and simplify slash normalization regex for clarity.
…nreal#122) - ast-grep: Log background init errors instead of silently swallowing them - grep: Add caseSensitive, wholeWord, fixedStrings options to tool definition These fixes improve debuggability and expose more functionality to users.
…nunreal#123) * feat: add reusable OpenCode free-model selection flow * docs: keep antigravity guide focused on antigravity setup * feat: add Chutes provider with adaptive model selection * fix: remove request-tier bias from Chutes model scoring * feat: add provider-aware 6-agent fallback chains with 15s failover * feat: add subscription prompts and provider-aware chains for anthropic/copilot/zai * refactor: rebalance mixed provider defaults toward Kimi and GPT-5.3 * feat: add dynamic provider-aware model planning from live catalog * docs: add 5 provider-combination config scenarios * feat: blend Artificial Analysis and OpenRouter signals into dynamic model planning * feat: prompt API keys during install for external ranking signals * refactor: rebalance dynamic planner for provider diversity and non-flash depth * refactor: boost explorer speed signals and map chutes aliases * refactor: prefer Gemini 3 Pro over 2.5 in dynamic ranking * feat: add version-aware recency scoring across model families * refactor: enforce provider-balanced primaries and richer fallback bundles * Address code review comments * Fix formatting issues * refactor: remove model-name and provider bonus heuristics * fix: check if chutes alias exists before adding * revert: restore heuristic bonuses in dynamic model planner * Merge pull request 3: Add external ranking signals from Artificial Analysis and OpenRouter * fix: harden fallback chains and external signal handling Prevent key leakage in installer prompts, isolate external ranking aliasing, and make free-model tails deterministic before Big Pickle fallback. Also relax fallback chain schema to preserve custom agent keys and align tests. * feat: add scoring v2 foundation with precedence resolver Introduce modular V2 scoring components, shadow-mode plumbing, and deterministic precedence resolution with provenance while keeping V1 as the applied default. Adds schema flag support and regression tests for determinism and fallback compilation. * feat: add OMOS manual preference backend operations Introduce a validated manual plan schema and new omos_preferences tool with show/plan/apply/reset-agent operations, including atomic writes with backups. Wire the tool into plugin registration and add tests for config loading and manual plan compilation. * feat: ship /omos command UX and rollout hardening Install the /omos command entry, add diff-first confirm flow for omos_preferences, and handle global/project target precedence warnings. Include tests for command install, diff behavior, and precedence plus rollout gate documentation updates. * test: add dynamic scoring matrix scenarios Add a deterministic provider-combination matrix test that compares v1, v2-shadow, and v2 outputs across curated mixes plus three random mixes. Refresh the provider matrix doc with the new scenario results and scoring-mode behavior. * feat: expose manual scoring preview and harden apply flow Add score-plan output for manual model plans, including per-agent ranked scores for v1/v2/shadow comparisons. Also align plan/apply diff hashes to target config, ensure target directories exist before writes, and update provider-coverage provenance after swap logic. * docs: add /omos score-plan guidance in English Update README and rollout docs with an English how-to for viewing model scores during manual planning, and refresh the installed /omos command template to include score-plan and diffHash-safe apply flow. * docs: add copy-paste /omos scoring workflow example Include an English command sequence for show -> score-plan -> plan -> apply using operation-based omos_preferences calls, with target selection and diffHash-safe apply guidance. * feat: move chutes to auth flow and full catalog refresh Remove CHUTES_API_KEY env dependency from installer/config wiring and align Chutes with OpenCode auth provider flow. Switch Chutes discovery to all provider models from opencode refresh output while keeping OpenCode free-model filtering unchanged. * fix: parse multi-segment provider model headers Accept provider/model headers with additional path segments so chutes catalog entries like chutes/vendor/model are discovered during opencode verbose parsing. * fix: support multi-segment model ids in /omos scoring Allow manual plan model ids like provider/vendor/model and keep them in score-plan ranking. Add regressions for loader validation and /omos scoring with multi-segment Chutes model identifiers. * fix: strip provider prefix for nested model signal matching Use model ids after the first slash for external signal lookup so chutes/vendor/model names map correctly in v1 and v2 scoring. Add regressions for multi-segment chutes ids in dynamic and scoring engine tests. * feat: normalize model keys and rebalance v2 capability scoring Apply canonical model alias normalization across external signal ingestion and v1/v2 lookups (strip provider prefix, remove TEE/FP* tokens, lowercase, and normalize slash/space/hyphen variants). Also switch V2 capability scoring to one-sided bonuses, add K2.5 version preference, and update regression matrices/tests for the new ranking behavior. * feat: add balanced subscription mode for install and /omos Add an installer question for subscription/pay-per-API usage and persist balanceProviderUsage in config. When enabled, dynamic planning rebalances provider assignments toward even distribution using a max score-loss tolerance. Also expose the setting in omos_preferences show/plan/apply flows and update docs/tests. * tune: reduce chutes qwen3 over-ranking Add targeted Chutes role priors that down-rank Qwen3 and prefer Kimi K2.5/Minimax M2.1 by role in both v1 and v2 scoring. Update matrix and regression tests to reflect the calibrated ranking behavior. * tune: remove Gemini bonus from v1 scoring * feat: add guided click-through /omos command flow * feat: remove /omos command, add manual model selection to CLI install - Remove omos_preferences tool and /omos command flow - Add --dry-run flag for testing install without OpenCode - Add manual setup mode to choose models per agent - Exclude already-selected models from fallback choices - Limit model list display to 5 items with option to type any model ID - Support primary + 3 fallbacks configuration per agent - Add ManualAgentConfig type for storing manual selections * fix: auto-detect opencode installation in common paths - Add OPENCODE_PATHS with common installation locations - Add resolveOpenCodePath() helper with caching - Update all opencode commands to use resolved path - Show found path in success message - Add helpful instructions when opencode not found in PATH * chore: remove accidentally added submodule * fix: expand opencode path detection for macOS and Linux Add comprehensive path detection including: - macOS: Applications, Homebrew, Library paths - Linux: /usr/bin, snap, flatpak, nix paths - Package managers: Homebrew, Cargo, npm, yarn, pnpm - More system-wide and user-local locations * chore: remove accidentally added submodule * fix: enforce selected provider models in dynamic assignments Respect user-selected primary/secondary models during dynamic planning, lock forced assignments from rebalance overrides, and update tests to match manual-plan precedence behavior. Also include robust OpenCode path resolution updates for cross-platform installs. * feat: add models-only update flow and refine model selection UX - add command and mode for updating assignments without reinstalling plugins/skills - keep full provider model lists available for primary/support choices - improve large-list selection with expansion option - refine dynamic planner provider candidate narrowing and pinned model handling - update matrix/planner tests and remove residual internal placeholder wording * chore: polish installer menu copy and provider prompt consistency Clarify quick/manual setup descriptions, standardize provider prompts, and improve models-only messaging. * fix: address PR review for OpenCode path resolution and alias regex Avoid spawning processes in resolveOpenCodePath by selecting only existing filesystem paths and simplify slash normalization regex for clarity. * fix: remove stale commands exports and placeholder test --------- Co-authored-by: Ruben Beuker <rubenbeuker@MacBook-Pro-van-Ruben.local> Co-authored-by: kiloconnect[bot] <240665456+kiloconnect[bot]@users.noreply.github.com> Co-authored-by: Your Name <you@example.com>
Updated the project description to reflect the open status of the Multi Agent Suite.
Co-authored-by: Claw <claw@Claws-Virtual-Machine.local>
…unreal#127) * Fix: Tmux Session Leak - Complete Session Lifecycle Management ## Problem When background tasks complete, tmux panes remained open and opencode attach processes became orphaned, accumulating over time. ## Root Cause Missing session lifecycle management: 1. No session.abort() on task completion 2. No session.abort() on cancellation 3. No session.deleted event handler 4. No graceful shutdown (Ctrl+C before kill) ## Solution - Add session.abort() in completeTask() (single point of responsibility) - Add handleSessionDeleted() for cleanup on session deletion - Add onSessionDeleted() in TmuxSessionManager for pane cleanup - Implement graceful shutdown: Ctrl+C before kill-pane - Wire up event handlers in main dispatcher ## Code Quality - Removed duplicate code in completeTask() - Eliminated redundant session.abort() calls - All 43 tests pass ## Documentation - Updated AGENTS.md with session lifecycle section - Updated docs/tmux-integration.md with troubleshooting ## Testing - @explorer and @librarian tasks complete and close panes automatically - Zero orphaned processes after task completion Inspired by oh-my-opencode session management implementation. * fix: Address Greptile review feedback - Remove redundant resolver deletion in handleSessionDeleted() - Add session tracking cleanup in completeTask() as fallback - Prevents memory leak if session.deleted event doesn't fire Fixes issues identified in latest Greptile review.
* Fix: Tmux Session Leak - Complete Session Lifecycle Management ## Problem When background tasks complete, tmux panes remained open and opencode attach processes became orphaned, accumulating over time. ## Root Cause Missing session lifecycle management: 1. No session.abort() on task completion 2. No session.abort() on cancellation 3. No session.deleted event handler 4. No graceful shutdown (Ctrl+C before kill) ## Solution - Add session.abort() in completeTask() (single point of responsibility) - Add handleSessionDeleted() for cleanup on session deletion - Add onSessionDeleted() in TmuxSessionManager for pane cleanup - Implement graceful shutdown: Ctrl+C before kill-pane - Wire up event handlers in main dispatcher ## Code Quality - Removed duplicate code in completeTask() - Eliminated redundant session.abort() calls - All 43 tests pass ## Documentation - Updated AGENTS.md with session lifecycle section - Updated docs/tmux-integration.md with troubleshooting ## Testing - @explorer and @librarian tasks complete and close panes automatically - Zero orphaned processes after task completion Inspired by oh-my-opencode session management implementation. * fix: Address Greptile review feedback - Remove redundant resolver deletion in handleSessionDeleted() - Add session tracking cleanup in completeTask() as fallback - Prevents memory leak if session.deleted event doesn't fire Fixes issues identified in latest Greptile review. * chore: Add .aim/ to .gitignore for AI memory directory * fix: disable timeout when fallback is disabled - Remove max(120000) constraint on timeoutMs - Allow timeoutMs: 0 to disable timeout entirely - Auto-disable timeout when fallback.enabled = false - Default remains 15000ms (unchanged)
| }, | ||
| ); | ||
|
|
||
| return hasExplicitOpenCodeAgent || input.hasOpencodeZen; |
There was a problem hiding this comment.
WARNING: Dead-code logic — || input.hasOpencodeZen makes hasExplicitOpenCodeAgent irrelevant
Because the function already returns false early on line 97 when !input.hasOpencodeZen, execution only reaches line 110 when input.hasOpencodeZen is true. The || input.hasOpencodeZen term is therefore always true, meaning the function always returns true whenever hasOpencodeZen is set — regardless of whether any agent actually uses an OpenCode model.
If the intent is to return true only when there is an explicit OpenCode agent assignment, the condition should be:
| return hasExplicitOpenCodeAgent || input.hasOpencodeZen; | |
| return hasExplicitOpenCodeAgent; |
If the intent is to always enable OpenCode free models when hasOpencodeZen is true, the hasExplicitOpenCodeAgent check and the entire some() loop above are dead code and should be removed.
| hasAntigravity: detected.hasAntigravity, | ||
| hasChutes: detected.hasChutes ?? false, | ||
| hasNanoGpt: hasNanoGptForRun, | ||
| hasOpencodeZen: true, |
There was a problem hiding this comment.
WARNING: hasOpencodeZen is hardcoded to true in refreshConfig, ignoring the actual detected value.
Line 209 correctly passes detected.hasOpencodeZen to inferOpenCodeFreeEnabled, but then refreshConfig unconditionally sets hasOpencodeZen: true. This means filterCatalogToEnabledProviders and buildDynamicModelPlan will always treat the user as having OpenCode Zen, potentially selecting models the user cannot access.
| hasOpencodeZen: true, | |
| hasOpencodeZen: detected.hasOpencodeZen, |
Summary
Why
This aligns model selection behavior with explicit provider and billing constraints while keeping routing current as catalogs and usage change.
Validation
Notes