Skip to content

feat(voice): add voice recognition service and audio pipeline#16

Closed
veithly wants to merge 18 commits intomasterfrom
feat/voice-recognition-v2
Closed

feat(voice): add voice recognition service and audio pipeline#16
veithly wants to merge 18 commits intomasterfrom
feat/voice-recognition-v2

Conversation

@veithly
Copy link
Contributor

@veithly veithly commented Feb 13, 2026

Summary

  • Audio Service (spoon_bot/services/audio/): Pluggable audio transcription with OpenAI Whisper API, featuring abstract base class, factory pattern, and lazy-init async client
  • Audio Pipeline: Middleware that auto-routes audio — STT transcription for providers like Claude, native passthrough for GPT-4o/Gemini
  • WebSocket Audio Streaming: Real-time protocol (audio.stream.start → binary chunks → audio.stream.end) with buffering, size/duration limits, and auto-format detection
  • REST Voice Endpoints: /v1/agent/voice/transcribe (STT-only) + /v1/agent/voice/chat (multipart file upload) + audio fields in existing /v1/agent/chat
  • Audio Utilities: Magic-byte format detection, base64/data-URL decode, WAV duration estimation, MIME-to-format mapping, 25MB validation
  • Configuration: AudioConfig + BudgetConfig dataclasses with full env var support (GATEWAY_AUDIO_*, GATEWAY_TIMEOUT_*)

New Files

File Description
spoon_bot/services/audio/__init__.py Package exports
spoon_bot/services/audio/base.py Abstract AudioTranscriber, TranscriptionResult, AudioSegment
spoon_bot/services/audio/whisper.py WhisperTranscriber — OpenAI Whisper API integration
spoon_bot/services/audio/factory.py create_transcriber() factory
spoon_bot/services/audio/pipeline.py AudioPipeline — transcribe-or-passthrough middleware
spoon_bot/services/audio/streaming.py AudioStreamManager — WS audio buffering
spoon_bot/services/audio/utils.py Format detection, validation, base64 decode

Modified Files

File Changes
gateway/config.py Added AudioConfig, BudgetConfig
gateway/api/v1/agent.py Audio processing in /chat, new /voice/* endpoints
gateway/models/requests.py Audio fields on ChatRequest
gateway/models/responses.py TranscriptionInfo model
gateway/websocket/handler.py Audio stream handlers
gateway/websocket/protocol.py Audio client methods + server events

Test Plan

  • Verify audio transcription with Whisper API (requires OPENAI_API_KEY)
  • Test native audio passthrough for OpenAI/Gemini providers
  • Test STT fallback for Anthropic provider
  • Verify WebSocket audio streaming protocol
  • Run tests/e2e_voice_input.py against live gateway
  • Confirm 172 existing tests pass (verified locally)

Made with Cursor

veithly and others added 17 commits February 11, 2026 15:42
… WS cancellation

- Remove manual API key / base_url resolution from SpoonBotConfig.from_env()
  and server.py lifespan; delegate entirely to spoon-core's ConfigurationManager
  which natively supports openrouter, openai, anthropic, gemini, deepseek, ollama
- Fix SpoonBot.stream() to use ChatBot.astream() for token-level streaming
  (BaseAgent.stream() signature mismatch was silently dropping all content)
- Fix agent status endpoint with safe hasattr checks for tools/skills/sessions
- Run WS chat requests as background asyncio tasks so cancel/status requests
  can be processed during streaming
- Add 13 comprehensive capability tests (instruction, JSON, reasoning, streaming,
  multi-turn, code gen, translation, summarization, math, REST API, error handling)
Bug #1: SkillManager signature mismatch — add runtime signature
check before passing include_default_paths to SkillManager.__init__

Bug #2: Skills activate/deactivate wrong registry — resolve real
SkillManager via _skill_manager/skill_manager instead of agent.skills
(list[str]); return structured 503 AGENT_NOT_READY on init failure

Bug #3: .env.example OpenRouter guidance — add explicit Option A
(openrouter provider) vs Option B (openai+base_url) examples

Bug #4: Chat ignored session_key — pass session_key through to
agent.process() and agent.stream() in both streaming and non-streaming paths

Bug #5: MCP filesystem tool mapping — add error handling for MCP tool
creation; patch missing _map_mcp_tool_name on SpoonReactSkill agents

Bug #6: ShellTool security false positive for format — remove bare
"format" from DANGEROUS_COMMANDS; use word-boundary-aware matching so
URL parameters like ?format=3 are not incorrectly blocked

Bug #7: Async chat/task APIs — implement in-process async task queue
with TaskStatus enum, background coroutine execution, cancellation
support, and task status polling

Bug #8: ScriptTool parameter contract — add script_tool_patch.py that
derives OpenAI tool schema from skill input_schema and serializes
tool-call kwargs to JSON stdin for script skills

Bug #9: Workspace skills SKILL.md format — add required YAML
frontmatter to git_helper and code_review skill definitions

Co-authored-by: Cursor <cursoragent@cursor.com>
- Fix AgentLoop.stream() to use process() fallback instead of broken
  base agent streaming infrastructure (ThreadSafeOutputQueue missing
  put_nowait, stream() while-loop condition bug)
- Fix _switch_session helper: manage session_key via SessionManager
  instead of passing it as kwarg to AgentLoop.process()
- Fix _stream_sse to propagate error chunks to SSE clients instead of
  silently swallowing them
- Fix async task to use session_key parameter correctly

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the process() fallback with the proper run+stream pattern now
that spoon-core streaming infrastructure is fixed:

- Reset task_done + drain output_queue before each request
- Spawn agent.run() as background task, read chunks from output_queue
- Handle dict, object, and string chunk formats
- Remove debug print() statements from _run_and_signal

Also adds e2e gateway test suite.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Change BASE_URL from port 8080 to 9090
- Fix empty dict body serialization (bool({}) is False in Python)
- Add sys.path setup so ShellTool can be imported when running standalone

All 30 tests pass.

Co-authored-by: Cursor <cursoragent@cursor.com>
1. Add web_search and web_fetch to CORE_TOOLS so the agent uses search
   tools instead of generating code when asked about real-time info
   (e.g. BTC price).

2. Fix ToolCall chunk handling in stream() — chunks from spoon-core
   contain ToolCall pydantic objects, not dicts.  Use getattr() to
   access .id, .function.name, .function.arguments safely.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace stub implementations with working tools:

1. WebSearchTool: add Tavily as default search provider with real API
   calls via httpx.  Returns structured results with answer + snippets.

2. WebFetchTool: implement real HTTP fetch with httpx, HTML text
   extraction via BeautifulSoup, JSON formatting, and SSRF protection.

3. ToolkitAdapter: broaden exception handling from ImportError to
   Exception so toolkit load failures (e.g. fastmcp version mismatch,
   missing auth tokens) don't crash the gateway.

4. .env.example: document TAVILY_API_KEY configuration.

Tested: BTC/ETH price queries return real-time data via Tavily search.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add ActivateToolTool that allows the LLM to load inactive tools on
demand at runtime:

1. New ActivateToolTool class with 'activate' and 'list' actions
2. Added to CORE_TOOLS (8 core tools total)
3. System prompt updated to instruct LLM to use activate_tool instead
   of asking the user to load tools manually

Flow: LLM sees "Dynamically Loadable Tools" in system prompt →
calls activate_tool(action='activate', tool_name='get_token_price') →
tool is injected into agent's ToolManager → LLM uses it next step.

Tested: agent successfully activates and calls get_token_price tool.
E2E 30/30 pass.

Co-authored-by: Cursor <cursoragent@cursor.com>
Bug report fixes (all 19 items from 2026-02-12/13):
- #5: MCP tool expansion in agent initialization
- #7: Async task API (submit/status/cancel)
- #8: ScriptTool structured parameters (via core)
- #9: SKILL.md BOM/CRLF loader (via core)
- #10: WS streaming empty content (dict chunk delta/content)
- #11: WS session binding in chat.send
- #12: WS session.import message persistence
- #13: WS cancel interrupts non-stream tasks
- #14: WS params type validation + error masking
- #15: WS session.switch rejects non-string keys
- #16: WS subscribe/unsubscribe validates events list
- #17: WS auth failure returns close code 4001
- #18: JWT session claim type validation
- #19: Auth rate limiting for WS connections

Tests: 61 in-memory + 18 live E2E, zero regressions
Co-authored-by: Cursor <cursoragent@cursor.com>
…ucture

- Merge 9 small test files into unified test_gateway_tracing.py
  (tracing utils, meta response, QA fixes, smart fallback, execution
  budget, WS tracing/cancel/timeout, REST tracing, cancellation, toolkit
  adapter timeout)
- Add gateway error codes, observability tracing & budget modules
- Improve tool configs (self_config, web, shell, toolkit adapter)
- Fix SKILL.md frontmatter in workspace skills
- Test count: 23 files → 14 files, zero test loss

Co-authored-by: Cursor <cursoragent@cursor.com>
Implement a SessionStore abstraction with three backends:
- FileSessionStore: JSONL files (existing behavior, default)
- SQLiteSessionStore: zero-dependency local database
- PostgresSessionStore: production-grade remote DB

Changes:
- Add session/store.py with abstract SessionStore + 3 implementations
- Refactor SessionManager to accept injected SessionStore via DI
- Add create_session_store() factory with config-driven backend selection
- Extend AgentLoopConfig with session_store_backend/dsn/db_path fields
- Wire gateway server.py to read SESSION_STORE_* env vars
- Update .env.example with configuration guidance
- Add 48 tests covering all backends, round-trip serialization,
  factory dispatch, and manager integration

Co-authored-by: Cursor <cursoragent@cursor.com>
…features

- Update all model IDs to current versions (Claude 4.6/Sonnet 4.5,
  GPT-5.2, DeepSeek V3.2, Gemini 2.5, Qwen3, etc.)
- Add OpenRouter popular models table with pricing
- Document session persistence (File/SQLite/PostgreSQL)
- Document web search (Tavily) integration
- Document dynamic tool loading
- Add WebSocket protocol details and authentication guide
- Add complete environment variables reference table
- Update architecture diagram with new tool categories
- Fix OpenRouter dual-configuration guidance (Option A vs B)

Co-authored-by: Cursor <cursoragent@cursor.com>
- Add MODEL_CONTEXT_WINDOWS lookup (40+ models) with auto-resolution
- Add resolve_context_window() helper (exact → prefix → suffix → 128K default)
- Add context_window param to AgentLoopConfig, AgentLoop, create_agent
- Inject context budget hint into system prompt
- Wire CONTEXT_WINDOW env var in gateway server
- Update README:
  - Add Gemini 3 Flash/Pro Preview models
  - Add per-provider configuration sections (Anthropic/OpenAI/DeepSeek/Gemini/OpenRouter)
  - Add Context Window documentation
  - Expand OpenRouter models table (21 models with pricing)
  - Add CONTEXT_WINDOW to env vars reference

Co-authored-by: Cursor <cursoragent@cursor.com>
- Add e2e_gemini3_test.py: 9 tests covering health, agent status,
  REST chat (stream/non-stream), session persistence, WS chat, WS
  session ops — all passing against live gateway with Gemini 3 Flash
- Fix SkillManager init: use inspect.signature to check
  include_default_paths support at runtime (cross-version compat)

Co-authored-by: Cursor <cursoragent@cursor.com>
Merge 10 small test files into 4 logical groups:
- test_security.py (shell + path security)
- test_agent_unit.py (dynamic tools, document, perf)
- e2e_gateway.py (basic + gemini E2E)
- test_integration_live.py (crypto, scenarios, agent real)

All 124 merged tests pass. No regressions.

Co-authored-by: Cursor <cursoragent@cursor.com>
Core audio service (spoon_bot/services/audio/):
- AudioTranscriber abstract base + WhisperTranscriber (OpenAI Whisper API)
- AudioPipeline: auto-routes audio — STT for non-native providers,
  passthrough for GPT-4o/Gemini
- AudioStreamManager: WebSocket real-time audio buffering + transcription
- Utils: format detection (magic bytes), base64 decode, WAV duration estimation

Gateway integration:
- REST: /v1/agent/voice/transcribe (STT-only) + /v1/agent/voice/chat (multipart)
- REST: /v1/agent/chat now accepts audio_data/audio_format/audio_language fields
- WebSocket: audio.stream.start/end methods for real-time streaming
- Config: AudioConfig + BudgetConfig dataclasses with env var support
- Models: TranscriptionInfo response model, ChatResponse.transcription field

Protocol: ClientMethod.AUDIO_STREAM_START/END, ServerEvent audio events
Co-authored-by: Cursor <cursoragent@cursor.com>
- web.py: add close_shared_http_client() for explicit AsyncClient shutdown
- gateway/app.py, gateway/server.py: call close on lifespan shutdown
- shell.py: block newline/CR multi-command injection bypass
- session/manager.py: add RLock thread-safety + max_cached_sessions eviction
- tests/test_security.py: add newline injection regression tests
- tests/test_session_persistence.py: add cache capacity boundary tests
veithly added a commit that referenced this pull request Mar 3, 2026
… WS cancellation (#10)

* feat(core): delegate provider config to spoon-core, fix streaming and WS cancellation

- Remove manual API key / base_url resolution from SpoonBotConfig.from_env()
  and server.py lifespan; delegate entirely to spoon-core's ConfigurationManager
  which natively supports openrouter, openai, anthropic, gemini, deepseek, ollama
- Fix SpoonBot.stream() to use ChatBot.astream() for token-level streaming
  (BaseAgent.stream() signature mismatch was silently dropping all content)
- Fix agent status endpoint with safe hasattr checks for tools/skills/sessions
- Run WS chat requests as background asyncio tasks so cancel/status requests
  can be processed during streaming
- Add 13 comprehensive capability tests (instruction, JSON, reasoning, streaming,
  multi-turn, code gen, translation, summarization, math, REST API, error handling)

* fix: resolve 9 bugs from QA report (2026-02-12)

Bug #1: SkillManager signature mismatch — add runtime signature
check before passing include_default_paths to SkillManager.__init__

Bug #2: Skills activate/deactivate wrong registry — resolve real
SkillManager via _skill_manager/skill_manager instead of agent.skills
(list[str]); return structured 503 AGENT_NOT_READY on init failure

Bug #3: .env.example OpenRouter guidance — add explicit Option A
(openrouter provider) vs Option B (openai+base_url) examples

Bug #4: Chat ignored session_key — pass session_key through to
agent.process() and agent.stream() in both streaming and non-streaming paths

Bug #5: MCP filesystem tool mapping — add error handling for MCP tool
creation; patch missing _map_mcp_tool_name on SpoonReactSkill agents

Bug #6: ShellTool security false positive for format — remove bare
"format" from DANGEROUS_COMMANDS; use word-boundary-aware matching so
URL parameters like ?format=3 are not incorrectly blocked

Bug #7: Async chat/task APIs — implement in-process async task queue
with TaskStatus enum, background coroutine execution, cancellation
support, and task status polling

Bug #8: ScriptTool parameter contract — add script_tool_patch.py that
derives OpenAI tool schema from skill input_schema and serializes
tool-call kwargs to JSON stdin for script skills

Bug #9: Workspace skills SKILL.md format — add required YAML
frontmatter to git_helper and code_review skill definitions

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: streaming chat, session_key handling, and SSE error propagation

- Fix AgentLoop.stream() to use process() fallback instead of broken
  base agent streaming infrastructure (ThreadSafeOutputQueue missing
  put_nowait, stream() while-loop condition bug)
- Fix _switch_session helper: manage session_key via SessionManager
  instead of passing it as kwarg to AgentLoop.process()
- Fix _stream_sse to propagate error chunks to SSE clients instead of
  silently swallowing them
- Fix async task to use session_key parameter correctly

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: use spoon-core run+stream pattern for real streaming

Replace the process() fallback with the proper run+stream pattern now
that spoon-core streaming infrastructure is fixed:

- Reset task_done + drain output_queue before each request
- Spawn agent.run() as background task, read chunks from output_queue
- Handle dict, object, and string chunk formats
- Remove debug print() statements from _run_and_signal

Also adds e2e gateway test suite.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: e2e test — correct BASE_URL, body serialization, and import path

- Change BASE_URL from port 8080 to 9090
- Fix empty dict body serialization (bool({}) is False in Python)
- Add sys.path setup so ShellTool can be imported when running standalone

All 30 tests pass.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: add web_search/web_fetch to CORE_TOOLS and fix ToolCall streaming

1. Add web_search and web_fetch to CORE_TOOLS so the agent uses search
   tools instead of generating code when asked about real-time info
   (e.g. BTC price).

2. Fix ToolCall chunk handling in stream() — chunks from spoon-core
   contain ToolCall pydantic objects, not dicts.  Use getattr() to
   access .id, .function.name, .function.arguments safely.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat: implement Tavily web search and real httpx web fetch

Replace stub implementations with working tools:

1. WebSearchTool: add Tavily as default search provider with real API
   calls via httpx.  Returns structured results with answer + snippets.

2. WebFetchTool: implement real HTTP fetch with httpx, HTML text
   extraction via BeautifulSoup, JSON formatting, and SSRF protection.

3. ToolkitAdapter: broaden exception handling from ImportError to
   Exception so toolkit load failures (e.g. fastmcp version mismatch,
   missing auth tokens) don't crash the gateway.

4. .env.example: document TAVILY_API_KEY configuration.

Tested: BTC/ETH price queries return real-time data via Tavily search.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat: dynamic tool loading via activate_tool

Add ActivateToolTool that allows the LLM to load inactive tools on
demand at runtime:

1. New ActivateToolTool class with 'activate' and 'list' actions
2. Added to CORE_TOOLS (8 core tools total)
3. System prompt updated to instruct LLM to use activate_tool instead
   of asking the user to load tools manually

Flow: LLM sees "Dynamically Loadable Tools" in system prompt →
calls activate_tool(action='activate', tool_name='get_token_price') →
tool is injected into agent's ToolManager → LLM uses it next step.

Tested: agent successfully activates and calls get_token_price tool.
E2E 30/30 pass.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: resolve 17 gateway/WS/core bugs with 61 regression tests

Bug report fixes (all 19 items from 2026-02-12/13):
- #5: MCP tool expansion in agent initialization
- #7: Async task API (submit/status/cancel)
- #8: ScriptTool structured parameters (via core)
- #9: SKILL.md BOM/CRLF loader (via core)
- #10: WS streaming empty content (dict chunk delta/content)
- #11: WS session binding in chat.send
- #12: WS session.import message persistence
- #13: WS cancel interrupts non-stream tasks
- #14: WS params type validation + error masking
- #15: WS session.switch rejects non-string keys
- #16: WS subscribe/unsubscribe validates events list
- #17: WS auth failure returns close code 4001
- #18: JWT session claim type validation
- #19: Auth rate limiting for WS connections

Tests: 61 in-memory + 18 live E2E, zero regressions
Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor: consolidate test files and add observability/error infrastructure

- Merge 9 small test files into unified test_gateway_tracing.py
  (tracing utils, meta response, QA fixes, smart fallback, execution
  budget, WS tracing/cancel/timeout, REST tracing, cancellation, toolkit
  adapter timeout)
- Add gateway error codes, observability tracing & budget modules
- Improve tool configs (self_config, web, shell, toolkit adapter)
- Fix SKILL.md frontmatter in workspace skills
- Test count: 23 files → 14 files, zero test loss

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
veithly added a commit that referenced this pull request Mar 3, 2026
* feat(core): delegate provider config to spoon-core, fix streaming and WS cancellation

- Remove manual API key / base_url resolution from SpoonBotConfig.from_env()
  and server.py lifespan; delegate entirely to spoon-core's ConfigurationManager
  which natively supports openrouter, openai, anthropic, gemini, deepseek, ollama
- Fix SpoonBot.stream() to use ChatBot.astream() for token-level streaming
  (BaseAgent.stream() signature mismatch was silently dropping all content)
- Fix agent status endpoint with safe hasattr checks for tools/skills/sessions
- Run WS chat requests as background asyncio tasks so cancel/status requests
  can be processed during streaming
- Add 13 comprehensive capability tests (instruction, JSON, reasoning, streaming,
  multi-turn, code gen, translation, summarization, math, REST API, error handling)

* fix: resolve 9 bugs from QA report (2026-02-12)

Bug #1: SkillManager signature mismatch — add runtime signature
check before passing include_default_paths to SkillManager.__init__

Bug #2: Skills activate/deactivate wrong registry — resolve real
SkillManager via _skill_manager/skill_manager instead of agent.skills
(list[str]); return structured 503 AGENT_NOT_READY on init failure

Bug #3: .env.example OpenRouter guidance — add explicit Option A
(openrouter provider) vs Option B (openai+base_url) examples

Bug #4: Chat ignored session_key — pass session_key through to
agent.process() and agent.stream() in both streaming and non-streaming paths

Bug #5: MCP filesystem tool mapping — add error handling for MCP tool
creation; patch missing _map_mcp_tool_name on SpoonReactSkill agents

Bug #6: ShellTool security false positive for format — remove bare
"format" from DANGEROUS_COMMANDS; use word-boundary-aware matching so
URL parameters like ?format=3 are not incorrectly blocked

Bug #7: Async chat/task APIs — implement in-process async task queue
with TaskStatus enum, background coroutine execution, cancellation
support, and task status polling

Bug #8: ScriptTool parameter contract — add script_tool_patch.py that
derives OpenAI tool schema from skill input_schema and serializes
tool-call kwargs to JSON stdin for script skills

Bug #9: Workspace skills SKILL.md format — add required YAML
frontmatter to git_helper and code_review skill definitions

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: streaming chat, session_key handling, and SSE error propagation

- Fix AgentLoop.stream() to use process() fallback instead of broken
  base agent streaming infrastructure (ThreadSafeOutputQueue missing
  put_nowait, stream() while-loop condition bug)
- Fix _switch_session helper: manage session_key via SessionManager
  instead of passing it as kwarg to AgentLoop.process()
- Fix _stream_sse to propagate error chunks to SSE clients instead of
  silently swallowing them
- Fix async task to use session_key parameter correctly

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: use spoon-core run+stream pattern for real streaming

Replace the process() fallback with the proper run+stream pattern now
that spoon-core streaming infrastructure is fixed:

- Reset task_done + drain output_queue before each request
- Spawn agent.run() as background task, read chunks from output_queue
- Handle dict, object, and string chunk formats
- Remove debug print() statements from _run_and_signal

Also adds e2e gateway test suite.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: e2e test — correct BASE_URL, body serialization, and import path

- Change BASE_URL from port 8080 to 9090
- Fix empty dict body serialization (bool({}) is False in Python)
- Add sys.path setup so ShellTool can be imported when running standalone

All 30 tests pass.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: add web_search/web_fetch to CORE_TOOLS and fix ToolCall streaming

1. Add web_search and web_fetch to CORE_TOOLS so the agent uses search
   tools instead of generating code when asked about real-time info
   (e.g. BTC price).

2. Fix ToolCall chunk handling in stream() — chunks from spoon-core
   contain ToolCall pydantic objects, not dicts.  Use getattr() to
   access .id, .function.name, .function.arguments safely.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat: implement Tavily web search and real httpx web fetch

Replace stub implementations with working tools:

1. WebSearchTool: add Tavily as default search provider with real API
   calls via httpx.  Returns structured results with answer + snippets.

2. WebFetchTool: implement real HTTP fetch with httpx, HTML text
   extraction via BeautifulSoup, JSON formatting, and SSRF protection.

3. ToolkitAdapter: broaden exception handling from ImportError to
   Exception so toolkit load failures (e.g. fastmcp version mismatch,
   missing auth tokens) don't crash the gateway.

4. .env.example: document TAVILY_API_KEY configuration.

Tested: BTC/ETH price queries return real-time data via Tavily search.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat: dynamic tool loading via activate_tool

Add ActivateToolTool that allows the LLM to load inactive tools on
demand at runtime:

1. New ActivateToolTool class with 'activate' and 'list' actions
2. Added to CORE_TOOLS (8 core tools total)
3. System prompt updated to instruct LLM to use activate_tool instead
   of asking the user to load tools manually

Flow: LLM sees "Dynamically Loadable Tools" in system prompt →
calls activate_tool(action='activate', tool_name='get_token_price') →
tool is injected into agent's ToolManager → LLM uses it next step.

Tested: agent successfully activates and calls get_token_price tool.
E2E 30/30 pass.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: resolve 17 gateway/WS/core bugs with 61 regression tests

Bug report fixes (all 19 items from 2026-02-12/13):
- #5: MCP tool expansion in agent initialization
- #7: Async task API (submit/status/cancel)
- #8: ScriptTool structured parameters (via core)
- #9: SKILL.md BOM/CRLF loader (via core)
- #10: WS streaming empty content (dict chunk delta/content)
- #11: WS session binding in chat.send
- #12: WS session.import message persistence
- #13: WS cancel interrupts non-stream tasks
- #14: WS params type validation + error masking
- #15: WS session.switch rejects non-string keys
- #16: WS subscribe/unsubscribe validates events list
- #17: WS auth failure returns close code 4001
- #18: JWT session claim type validation
- #19: Auth rate limiting for WS connections

Tests: 61 in-memory + 18 live E2E, zero regressions
Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor: consolidate test files and add observability/error infrastructure

- Merge 9 small test files into unified test_gateway_tracing.py
  (tracing utils, meta response, QA fixes, smart fallback, execution
  budget, WS tracing/cancel/timeout, REST tracing, cancellation, toolkit
  adapter timeout)
- Add gateway error codes, observability tracing & budget modules
- Improve tool configs (self_config, web, shell, toolkit adapter)
- Fix SKILL.md frontmatter in workspace skills
- Test count: 23 files → 14 files, zero test loss

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat: pluggable session persistence (SQLite / PostgreSQL)

Implement a SessionStore abstraction with three backends:
- FileSessionStore: JSONL files (existing behavior, default)
- SQLiteSessionStore: zero-dependency local database
- PostgresSessionStore: production-grade remote DB

Changes:
- Add session/store.py with abstract SessionStore + 3 implementations
- Refactor SessionManager to accept injected SessionStore via DI
- Add create_session_store() factory with config-driven backend selection
- Extend AgentLoopConfig with session_store_backend/dsn/db_path fields
- Wire gateway server.py to read SESSION_STORE_* env vars
- Update .env.example with configuration guidance
- Add 48 tests covering all backends, round-trip serialization,
  factory dispatch, and manager integration

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs: update README with latest models, session persistence, and new features

- Update all model IDs to current versions (Claude 4.6/Sonnet 4.5,
  GPT-5.2, DeepSeek V3.2, Gemini 2.5, Qwen3, etc.)
- Add OpenRouter popular models table with pricing
- Document session persistence (File/SQLite/PostgreSQL)
- Document web search (Tavily) integration
- Document dynamic tool loading
- Add WebSocket protocol details and authentication guide
- Add complete environment variables reference table
- Update architecture diagram with new tool categories
- Fix OpenRouter dual-configuration guidance (Option A vs B)

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat: model-aware context window + Gemini 3 + per-provider docs

- Add MODEL_CONTEXT_WINDOWS lookup (40+ models) with auto-resolution
- Add resolve_context_window() helper (exact → prefix → suffix → 128K default)
- Add context_window param to AgentLoopConfig, AgentLoop, create_agent
- Inject context budget hint into system prompt
- Wire CONTEXT_WINDOW env var in gateway server
- Update README:
  - Add Gemini 3 Flash/Pro Preview models
  - Add per-provider configuration sections (Anthropic/OpenAI/DeepSeek/Gemini/OpenRouter)
  - Add Context Window documentation
  - Expand OpenRouter models table (21 models with pricing)
  - Add CONTEXT_WINDOW to env vars reference

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat: E2E test with Gemini 3 Flash + fix SkillManager init

- Add e2e_gemini3_test.py: 9 tests covering health, agent status,
  REST chat (stream/non-stream), session persistence, WS chat, WS
  session ops — all passing against live gateway with Gemini 3 Flash
- Fix SkillManager init: use inspect.signature to check
  include_default_paths support at runtime (cross-version compat)

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor: consolidate test files (22 → 16)

Merge 10 small test files into 4 logical groups:
- test_security.py (shell + path security)
- test_agent_unit.py (dynamic tools, document, perf)
- e2e_gateway.py (basic + gemini E2E)
- test_integration_live.py (crypto, scenarios, agent real)

All 124 merged tests pass. No regressions.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: lifecycle, security & session hardening for PR#15 review

- web.py: add close_shared_http_client() for explicit AsyncClient shutdown
- gateway/app.py, gateway/server.py: call close on lifespan shutdown
- shell.py: block newline/CR multi-command injection bypass
- session/manager.py: add RLock thread-safety + max_cached_sessions eviction
- tests/test_security.py: add newline injection regression tests
- tests/test_session_persistence.py: add cache capacity boundary tests

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: OpenClaw Bot <bot@openclaw.ai>
@veithly
Copy link
Contributor Author

veithly commented Mar 9, 2026

The voice recognition feature (feat/voice-recognition-v2) was already merged into the dev branch and is now part of master via the dev→master merge (commit 04ca701). Closing this PR as redundant.

@veithly veithly closed this Mar 9, 2026
@veithly veithly deleted the feat/voice-recognition-v2 branch March 9, 2026 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant