feat(voice): add voice recognition service and audio pipeline by veithly · Pull Request #16 · XSpoonAi/spoon-bot

veithly · 2026-02-13T11:05:33Z

Summary

Audio Service (spoon_bot/services/audio/): Pluggable audio transcription with OpenAI Whisper API, featuring abstract base class, factory pattern, and lazy-init async client
Audio Pipeline: Middleware that auto-routes audio — STT transcription for providers like Claude, native passthrough for GPT-4o/Gemini
WebSocket Audio Streaming: Real-time protocol (audio.stream.start → binary chunks → audio.stream.end) with buffering, size/duration limits, and auto-format detection
REST Voice Endpoints: /v1/agent/voice/transcribe (STT-only) + /v1/agent/voice/chat (multipart file upload) + audio fields in existing /v1/agent/chat
Audio Utilities: Magic-byte format detection, base64/data-URL decode, WAV duration estimation, MIME-to-format mapping, 25MB validation
Configuration: AudioConfig + BudgetConfig dataclasses with full env var support (GATEWAY_AUDIO_*, GATEWAY_TIMEOUT_*)

New Files

File	Description
`spoon_bot/services/audio/__init__.py`	Package exports
`spoon_bot/services/audio/base.py`	Abstract `AudioTranscriber`, `TranscriptionResult`, `AudioSegment`
`spoon_bot/services/audio/whisper.py`	`WhisperTranscriber` — OpenAI Whisper API integration
`spoon_bot/services/audio/factory.py`	`create_transcriber()` factory
`spoon_bot/services/audio/pipeline.py`	`AudioPipeline` — transcribe-or-passthrough middleware
`spoon_bot/services/audio/streaming.py`	`AudioStreamManager` — WS audio buffering
`spoon_bot/services/audio/utils.py`	Format detection, validation, base64 decode

Modified Files

File	Changes
`gateway/config.py`	Added `AudioConfig`, `BudgetConfig`
`gateway/api/v1/agent.py`	Audio processing in `/chat`, new `/voice/*` endpoints
`gateway/models/requests.py`	Audio fields on `ChatRequest`
`gateway/models/responses.py`	`TranscriptionInfo` model
`gateway/websocket/handler.py`	Audio stream handlers
`gateway/websocket/protocol.py`	Audio client methods + server events

Test Plan

Verify audio transcription with Whisper API (requires OPENAI_API_KEY)
Test native audio passthrough for OpenAI/Gemini providers
Test STT fallback for Anthropic provider
Verify WebSocket audio streaming protocol
Run tests/e2e_voice_input.py against live gateway
Confirm 172 existing tests pass (verified locally)

Made with Cursor

… WS cancellation - Remove manual API key / base_url resolution from SpoonBotConfig.from_env() and server.py lifespan; delegate entirely to spoon-core's ConfigurationManager which natively supports openrouter, openai, anthropic, gemini, deepseek, ollama - Fix SpoonBot.stream() to use ChatBot.astream() for token-level streaming (BaseAgent.stream() signature mismatch was silently dropping all content) - Fix agent status endpoint with safe hasattr checks for tools/skills/sessions - Run WS chat requests as background asyncio tasks so cancel/status requests can be processed during streaming - Add 13 comprehensive capability tests (instruction, JSON, reasoning, streaming, multi-turn, code gen, translation, summarization, math, REST API, error handling)

Bug #1: SkillManager signature mismatch — add runtime signature check before passing include_default_paths to SkillManager.__init__ Bug #2: Skills activate/deactivate wrong registry — resolve real SkillManager via _skill_manager/skill_manager instead of agent.skills (list[str]); return structured 503 AGENT_NOT_READY on init failure Bug #3: .env.example OpenRouter guidance — add explicit Option A (openrouter provider) vs Option B (openai+base_url) examples Bug #4: Chat ignored session_key — pass session_key through to agent.process() and agent.stream() in both streaming and non-streaming paths Bug #5: MCP filesystem tool mapping — add error handling for MCP tool creation; patch missing _map_mcp_tool_name on SpoonReactSkill agents Bug #6: ShellTool security false positive for format — remove bare "format" from DANGEROUS_COMMANDS; use word-boundary-aware matching so URL parameters like ?format=3 are not incorrectly blocked Bug #7: Async chat/task APIs — implement in-process async task queue with TaskStatus enum, background coroutine execution, cancellation support, and task status polling Bug #8: ScriptTool parameter contract — add script_tool_patch.py that derives OpenAI tool schema from skill input_schema and serializes tool-call kwargs to JSON stdin for script skills Bug #9: Workspace skills SKILL.md format — add required YAML frontmatter to git_helper and code_review skill definitions Co-authored-by: Cursor <cursoragent@cursor.com>

- Fix AgentLoop.stream() to use process() fallback instead of broken base agent streaming infrastructure (ThreadSafeOutputQueue missing put_nowait, stream() while-loop condition bug) - Fix _switch_session helper: manage session_key via SessionManager instead of passing it as kwarg to AgentLoop.process() - Fix _stream_sse to propagate error chunks to SSE clients instead of silently swallowing them - Fix async task to use session_key parameter correctly Co-authored-by: Cursor <cursoragent@cursor.com>

Replace the process() fallback with the proper run+stream pattern now that spoon-core streaming infrastructure is fixed: - Reset task_done + drain output_queue before each request - Spawn agent.run() as background task, read chunks from output_queue - Handle dict, object, and string chunk formats - Remove debug print() statements from _run_and_signal Also adds e2e gateway test suite. Co-authored-by: Cursor <cursoragent@cursor.com>

- Change BASE_URL from port 8080 to 9090 - Fix empty dict body serialization (bool({}) is False in Python) - Add sys.path setup so ShellTool can be imported when running standalone All 30 tests pass. Co-authored-by: Cursor <cursoragent@cursor.com>

1. Add web_search and web_fetch to CORE_TOOLS so the agent uses search tools instead of generating code when asked about real-time info (e.g. BTC price). 2. Fix ToolCall chunk handling in stream() — chunks from spoon-core contain ToolCall pydantic objects, not dicts. Use getattr() to access .id, .function.name, .function.arguments safely. Co-authored-by: Cursor <cursoragent@cursor.com>

Replace stub implementations with working tools: 1. WebSearchTool: add Tavily as default search provider with real API calls via httpx. Returns structured results with answer + snippets. 2. WebFetchTool: implement real HTTP fetch with httpx, HTML text extraction via BeautifulSoup, JSON formatting, and SSRF protection. 3. ToolkitAdapter: broaden exception handling from ImportError to Exception so toolkit load failures (e.g. fastmcp version mismatch, missing auth tokens) don't crash the gateway. 4. .env.example: document TAVILY_API_KEY configuration. Tested: BTC/ETH price queries return real-time data via Tavily search. Co-authored-by: Cursor <cursoragent@cursor.com>

Add ActivateToolTool that allows the LLM to load inactive tools on demand at runtime: 1. New ActivateToolTool class with 'activate' and 'list' actions 2. Added to CORE_TOOLS (8 core tools total) 3. System prompt updated to instruct LLM to use activate_tool instead of asking the user to load tools manually Flow: LLM sees "Dynamically Loadable Tools" in system prompt → calls activate_tool(action='activate', tool_name='get_token_price') → tool is injected into agent's ToolManager → LLM uses it next step. Tested: agent successfully activates and calls get_token_price tool. E2E 30/30 pass. Co-authored-by: Cursor <cursoragent@cursor.com>

Bug report fixes (all 19 items from 2026-02-12/13): - #5: MCP tool expansion in agent initialization - #7: Async task API (submit/status/cancel) - #8: ScriptTool structured parameters (via core) - #9: SKILL.md BOM/CRLF loader (via core) - #10: WS streaming empty content (dict chunk delta/content) - #11: WS session binding in chat.send - #12: WS session.import message persistence - #13: WS cancel interrupts non-stream tasks - #14: WS params type validation + error masking - #15: WS session.switch rejects non-string keys - #16: WS subscribe/unsubscribe validates events list - #17: WS auth failure returns close code 4001 - #18: JWT session claim type validation - #19: Auth rate limiting for WS connections Tests: 61 in-memory + 18 live E2E, zero regressions Co-authored-by: Cursor <cursoragent@cursor.com>

…ucture - Merge 9 small test files into unified test_gateway_tracing.py (tracing utils, meta response, QA fixes, smart fallback, execution budget, WS tracing/cancel/timeout, REST tracing, cancellation, toolkit adapter timeout) - Add gateway error codes, observability tracing & budget modules - Improve tool configs (self_config, web, shell, toolkit adapter) - Fix SKILL.md frontmatter in workspace skills - Test count: 23 files → 14 files, zero test loss Co-authored-by: Cursor <cursoragent@cursor.com>

Implement a SessionStore abstraction with three backends: - FileSessionStore: JSONL files (existing behavior, default) - SQLiteSessionStore: zero-dependency local database - PostgresSessionStore: production-grade remote DB Changes: - Add session/store.py with abstract SessionStore + 3 implementations - Refactor SessionManager to accept injected SessionStore via DI - Add create_session_store() factory with config-driven backend selection - Extend AgentLoopConfig with session_store_backend/dsn/db_path fields - Wire gateway server.py to read SESSION_STORE_* env vars - Update .env.example with configuration guidance - Add 48 tests covering all backends, round-trip serialization, factory dispatch, and manager integration Co-authored-by: Cursor <cursoragent@cursor.com>

…features - Update all model IDs to current versions (Claude 4.6/Sonnet 4.5, GPT-5.2, DeepSeek V3.2, Gemini 2.5, Qwen3, etc.) - Add OpenRouter popular models table with pricing - Document session persistence (File/SQLite/PostgreSQL) - Document web search (Tavily) integration - Document dynamic tool loading - Add WebSocket protocol details and authentication guide - Add complete environment variables reference table - Update architecture diagram with new tool categories - Fix OpenRouter dual-configuration guidance (Option A vs B) Co-authored-by: Cursor <cursoragent@cursor.com>

- Add MODEL_CONTEXT_WINDOWS lookup (40+ models) with auto-resolution - Add resolve_context_window() helper (exact → prefix → suffix → 128K default) - Add context_window param to AgentLoopConfig, AgentLoop, create_agent - Inject context budget hint into system prompt - Wire CONTEXT_WINDOW env var in gateway server - Update README: - Add Gemini 3 Flash/Pro Preview models - Add per-provider configuration sections (Anthropic/OpenAI/DeepSeek/Gemini/OpenRouter) - Add Context Window documentation - Expand OpenRouter models table (21 models with pricing) - Add CONTEXT_WINDOW to env vars reference Co-authored-by: Cursor <cursoragent@cursor.com>

- Add e2e_gemini3_test.py: 9 tests covering health, agent status, REST chat (stream/non-stream), session persistence, WS chat, WS session ops — all passing against live gateway with Gemini 3 Flash - Fix SkillManager init: use inspect.signature to check include_default_paths support at runtime (cross-version compat) Co-authored-by: Cursor <cursoragent@cursor.com>

Merge 10 small test files into 4 logical groups: - test_security.py (shell + path security) - test_agent_unit.py (dynamic tools, document, perf) - e2e_gateway.py (basic + gemini E2E) - test_integration_live.py (crypto, scenarios, agent real) All 124 merged tests pass. No regressions. Co-authored-by: Cursor <cursoragent@cursor.com>

Core audio service (spoon_bot/services/audio/): - AudioTranscriber abstract base + WhisperTranscriber (OpenAI Whisper API) - AudioPipeline: auto-routes audio — STT for non-native providers, passthrough for GPT-4o/Gemini - AudioStreamManager: WebSocket real-time audio buffering + transcription - Utils: format detection (magic bytes), base64 decode, WAV duration estimation Gateway integration: - REST: /v1/agent/voice/transcribe (STT-only) + /v1/agent/voice/chat (multipart) - REST: /v1/agent/chat now accepts audio_data/audio_format/audio_language fields - WebSocket: audio.stream.start/end methods for real-time streaming - Config: AudioConfig + BudgetConfig dataclasses with env var support - Models: TranscriptionInfo response model, ChatResponse.transcription field Protocol: ClientMethod.AUDIO_STREAM_START/END, ServerEvent audio events Co-authored-by: Cursor <cursoragent@cursor.com>

- web.py: add close_shared_http_client() for explicit AsyncClient shutdown - gateway/app.py, gateway/server.py: call close on lifespan shutdown - shell.py: block newline/CR multi-command injection bypass - session/manager.py: add RLock thread-safety + max_cached_sessions eviction - tests/test_security.py: add newline injection regression tests - tests/test_session_persistence.py: add cache capacity boundary tests

… WS cancellation (#10) * feat(core): delegate provider config to spoon-core, fix streaming and WS cancellation - Remove manual API key / base_url resolution from SpoonBotConfig.from_env() and server.py lifespan; delegate entirely to spoon-core's ConfigurationManager which natively supports openrouter, openai, anthropic, gemini, deepseek, ollama - Fix SpoonBot.stream() to use ChatBot.astream() for token-level streaming (BaseAgent.stream() signature mismatch was silently dropping all content) - Fix agent status endpoint with safe hasattr checks for tools/skills/sessions - Run WS chat requests as background asyncio tasks so cancel/status requests can be processed during streaming - Add 13 comprehensive capability tests (instruction, JSON, reasoning, streaming, multi-turn, code gen, translation, summarization, math, REST API, error handling) * fix: resolve 9 bugs from QA report (2026-02-12) Bug #1: SkillManager signature mismatch — add runtime signature check before passing include_default_paths to SkillManager.__init__ Bug #2: Skills activate/deactivate wrong registry — resolve real SkillManager via _skill_manager/skill_manager instead of agent.skills (list[str]); return structured 503 AGENT_NOT_READY on init failure Bug #3: .env.example OpenRouter guidance — add explicit Option A (openrouter provider) vs Option B (openai+base_url) examples Bug #4: Chat ignored session_key — pass session_key through to agent.process() and agent.stream() in both streaming and non-streaming paths Bug #5: MCP filesystem tool mapping — add error handling for MCP tool creation; patch missing _map_mcp_tool_name on SpoonReactSkill agents Bug #6: ShellTool security false positive for format — remove bare "format" from DANGEROUS_COMMANDS; use word-boundary-aware matching so URL parameters like ?format=3 are not incorrectly blocked Bug #7: Async chat/task APIs — implement in-process async task queue with TaskStatus enum, background coroutine execution, cancellation support, and task status polling Bug #8: ScriptTool parameter contract — add script_tool_patch.py that derives OpenAI tool schema from skill input_schema and serializes tool-call kwargs to JSON stdin for script skills Bug #9: Workspace skills SKILL.md format — add required YAML frontmatter to git_helper and code_review skill definitions Co-authored-by: Cursor <cursoragent@cursor.com> * fix: streaming chat, session_key handling, and SSE error propagation - Fix AgentLoop.stream() to use process() fallback instead of broken base agent streaming infrastructure (ThreadSafeOutputQueue missing put_nowait, stream() while-loop condition bug) - Fix _switch_session helper: manage session_key via SessionManager instead of passing it as kwarg to AgentLoop.process() - Fix _stream_sse to propagate error chunks to SSE clients instead of silently swallowing them - Fix async task to use session_key parameter correctly Co-authored-by: Cursor <cursoragent@cursor.com> * fix: use spoon-core run+stream pattern for real streaming Replace the process() fallback with the proper run+stream pattern now that spoon-core streaming infrastructure is fixed: - Reset task_done + drain output_queue before each request - Spawn agent.run() as background task, read chunks from output_queue - Handle dict, object, and string chunk formats - Remove debug print() statements from _run_and_signal Also adds e2e gateway test suite. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: e2e test — correct BASE_URL, body serialization, and import path - Change BASE_URL from port 8080 to 9090 - Fix empty dict body serialization (bool({}) is False in Python) - Add sys.path setup so ShellTool can be imported when running standalone All 30 tests pass. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: add web_search/web_fetch to CORE_TOOLS and fix ToolCall streaming 1. Add web_search and web_fetch to CORE_TOOLS so the agent uses search tools instead of generating code when asked about real-time info (e.g. BTC price). 2. Fix ToolCall chunk handling in stream() — chunks from spoon-core contain ToolCall pydantic objects, not dicts. Use getattr() to access .id, .function.name, .function.arguments safely. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: implement Tavily web search and real httpx web fetch Replace stub implementations with working tools: 1. WebSearchTool: add Tavily as default search provider with real API calls via httpx. Returns structured results with answer + snippets. 2. WebFetchTool: implement real HTTP fetch with httpx, HTML text extraction via BeautifulSoup, JSON formatting, and SSRF protection. 3. ToolkitAdapter: broaden exception handling from ImportError to Exception so toolkit load failures (e.g. fastmcp version mismatch, missing auth tokens) don't crash the gateway. 4. .env.example: document TAVILY_API_KEY configuration. Tested: BTC/ETH price queries return real-time data via Tavily search. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: dynamic tool loading via activate_tool Add ActivateToolTool that allows the LLM to load inactive tools on demand at runtime: 1. New ActivateToolTool class with 'activate' and 'list' actions 2. Added to CORE_TOOLS (8 core tools total) 3. System prompt updated to instruct LLM to use activate_tool instead of asking the user to load tools manually Flow: LLM sees "Dynamically Loadable Tools" in system prompt → calls activate_tool(action='activate', tool_name='get_token_price') → tool is injected into agent's ToolManager → LLM uses it next step. Tested: agent successfully activates and calls get_token_price tool. E2E 30/30 pass. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: resolve 17 gateway/WS/core bugs with 61 regression tests Bug report fixes (all 19 items from 2026-02-12/13): - #5: MCP tool expansion in agent initialization - #7: Async task API (submit/status/cancel) - #8: ScriptTool structured parameters (via core) - #9: SKILL.md BOM/CRLF loader (via core) - #10: WS streaming empty content (dict chunk delta/content) - #11: WS session binding in chat.send - #12: WS session.import message persistence - #13: WS cancel interrupts non-stream tasks - #14: WS params type validation + error masking - #15: WS session.switch rejects non-string keys - #16: WS subscribe/unsubscribe validates events list - #17: WS auth failure returns close code 4001 - #18: JWT session claim type validation - #19: Auth rate limiting for WS connections Tests: 61 in-memory + 18 live E2E, zero regressions Co-authored-by: Cursor <cursoragent@cursor.com> * refactor: consolidate test files and add observability/error infrastructure - Merge 9 small test files into unified test_gateway_tracing.py (tracing utils, meta response, QA fixes, smart fallback, execution budget, WS tracing/cancel/timeout, REST tracing, cancellation, toolkit adapter timeout) - Add gateway error codes, observability tracing & budget modules - Improve tool configs (self_config, web, shell, toolkit adapter) - Fix SKILL.md frontmatter in workspace skills - Test count: 23 files → 14 files, zero test loss Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(core): delegate provider config to spoon-core, fix streaming and WS cancellation - Remove manual API key / base_url resolution from SpoonBotConfig.from_env() and server.py lifespan; delegate entirely to spoon-core's ConfigurationManager which natively supports openrouter, openai, anthropic, gemini, deepseek, ollama - Fix SpoonBot.stream() to use ChatBot.astream() for token-level streaming (BaseAgent.stream() signature mismatch was silently dropping all content) - Fix agent status endpoint with safe hasattr checks for tools/skills/sessions - Run WS chat requests as background asyncio tasks so cancel/status requests can be processed during streaming - Add 13 comprehensive capability tests (instruction, JSON, reasoning, streaming, multi-turn, code gen, translation, summarization, math, REST API, error handling) * fix: resolve 9 bugs from QA report (2026-02-12) Bug #1: SkillManager signature mismatch — add runtime signature check before passing include_default_paths to SkillManager.__init__ Bug #2: Skills activate/deactivate wrong registry — resolve real SkillManager via _skill_manager/skill_manager instead of agent.skills (list[str]); return structured 503 AGENT_NOT_READY on init failure Bug #3: .env.example OpenRouter guidance — add explicit Option A (openrouter provider) vs Option B (openai+base_url) examples Bug #4: Chat ignored session_key — pass session_key through to agent.process() and agent.stream() in both streaming and non-streaming paths Bug #5: MCP filesystem tool mapping — add error handling for MCP tool creation; patch missing _map_mcp_tool_name on SpoonReactSkill agents Bug #6: ShellTool security false positive for format — remove bare "format" from DANGEROUS_COMMANDS; use word-boundary-aware matching so URL parameters like ?format=3 are not incorrectly blocked Bug #7: Async chat/task APIs — implement in-process async task queue with TaskStatus enum, background coroutine execution, cancellation support, and task status polling Bug #8: ScriptTool parameter contract — add script_tool_patch.py that derives OpenAI tool schema from skill input_schema and serializes tool-call kwargs to JSON stdin for script skills Bug #9: Workspace skills SKILL.md format — add required YAML frontmatter to git_helper and code_review skill definitions Co-authored-by: Cursor <cursoragent@cursor.com> * fix: streaming chat, session_key handling, and SSE error propagation - Fix AgentLoop.stream() to use process() fallback instead of broken base agent streaming infrastructure (ThreadSafeOutputQueue missing put_nowait, stream() while-loop condition bug) - Fix _switch_session helper: manage session_key via SessionManager instead of passing it as kwarg to AgentLoop.process() - Fix _stream_sse to propagate error chunks to SSE clients instead of silently swallowing them - Fix async task to use session_key parameter correctly Co-authored-by: Cursor <cursoragent@cursor.com> * fix: use spoon-core run+stream pattern for real streaming Replace the process() fallback with the proper run+stream pattern now that spoon-core streaming infrastructure is fixed: - Reset task_done + drain output_queue before each request - Spawn agent.run() as background task, read chunks from output_queue - Handle dict, object, and string chunk formats - Remove debug print() statements from _run_and_signal Also adds e2e gateway test suite. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: e2e test — correct BASE_URL, body serialization, and import path - Change BASE_URL from port 8080 to 9090 - Fix empty dict body serialization (bool({}) is False in Python) - Add sys.path setup so ShellTool can be imported when running standalone All 30 tests pass. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: add web_search/web_fetch to CORE_TOOLS and fix ToolCall streaming 1. Add web_search and web_fetch to CORE_TOOLS so the agent uses search tools instead of generating code when asked about real-time info (e.g. BTC price). 2. Fix ToolCall chunk handling in stream() — chunks from spoon-core contain ToolCall pydantic objects, not dicts. Use getattr() to access .id, .function.name, .function.arguments safely. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: implement Tavily web search and real httpx web fetch Replace stub implementations with working tools: 1. WebSearchTool: add Tavily as default search provider with real API calls via httpx. Returns structured results with answer + snippets. 2. WebFetchTool: implement real HTTP fetch with httpx, HTML text extraction via BeautifulSoup, JSON formatting, and SSRF protection. 3. ToolkitAdapter: broaden exception handling from ImportError to Exception so toolkit load failures (e.g. fastmcp version mismatch, missing auth tokens) don't crash the gateway. 4. .env.example: document TAVILY_API_KEY configuration. Tested: BTC/ETH price queries return real-time data via Tavily search. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: dynamic tool loading via activate_tool Add ActivateToolTool that allows the LLM to load inactive tools on demand at runtime: 1. New ActivateToolTool class with 'activate' and 'list' actions 2. Added to CORE_TOOLS (8 core tools total) 3. System prompt updated to instruct LLM to use activate_tool instead of asking the user to load tools manually Flow: LLM sees "Dynamically Loadable Tools" in system prompt → calls activate_tool(action='activate', tool_name='get_token_price') → tool is injected into agent's ToolManager → LLM uses it next step. Tested: agent successfully activates and calls get_token_price tool. E2E 30/30 pass. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: resolve 17 gateway/WS/core bugs with 61 regression tests Bug report fixes (all 19 items from 2026-02-12/13): - #5: MCP tool expansion in agent initialization - #7: Async task API (submit/status/cancel) - #8: ScriptTool structured parameters (via core) - #9: SKILL.md BOM/CRLF loader (via core) - #10: WS streaming empty content (dict chunk delta/content) - #11: WS session binding in chat.send - #12: WS session.import message persistence - #13: WS cancel interrupts non-stream tasks - #14: WS params type validation + error masking - #15: WS session.switch rejects non-string keys - #16: WS subscribe/unsubscribe validates events list - #17: WS auth failure returns close code 4001 - #18: JWT session claim type validation - #19: Auth rate limiting for WS connections Tests: 61 in-memory + 18 live E2E, zero regressions Co-authored-by: Cursor <cursoragent@cursor.com> * refactor: consolidate test files and add observability/error infrastructure - Merge 9 small test files into unified test_gateway_tracing.py (tracing utils, meta response, QA fixes, smart fallback, execution budget, WS tracing/cancel/timeout, REST tracing, cancellation, toolkit adapter timeout) - Add gateway error codes, observability tracing & budget modules - Improve tool configs (self_config, web, shell, toolkit adapter) - Fix SKILL.md frontmatter in workspace skills - Test count: 23 files → 14 files, zero test loss Co-authored-by: Cursor <cursoragent@cursor.com> * feat: pluggable session persistence (SQLite / PostgreSQL) Implement a SessionStore abstraction with three backends: - FileSessionStore: JSONL files (existing behavior, default) - SQLiteSessionStore: zero-dependency local database - PostgresSessionStore: production-grade remote DB Changes: - Add session/store.py with abstract SessionStore + 3 implementations - Refactor SessionManager to accept injected SessionStore via DI - Add create_session_store() factory with config-driven backend selection - Extend AgentLoopConfig with session_store_backend/dsn/db_path fields - Wire gateway server.py to read SESSION_STORE_* env vars - Update .env.example with configuration guidance - Add 48 tests covering all backends, round-trip serialization, factory dispatch, and manager integration Co-authored-by: Cursor <cursoragent@cursor.com> * docs: update README with latest models, session persistence, and new features - Update all model IDs to current versions (Claude 4.6/Sonnet 4.5, GPT-5.2, DeepSeek V3.2, Gemini 2.5, Qwen3, etc.) - Add OpenRouter popular models table with pricing - Document session persistence (File/SQLite/PostgreSQL) - Document web search (Tavily) integration - Document dynamic tool loading - Add WebSocket protocol details and authentication guide - Add complete environment variables reference table - Update architecture diagram with new tool categories - Fix OpenRouter dual-configuration guidance (Option A vs B) Co-authored-by: Cursor <cursoragent@cursor.com> * feat: model-aware context window + Gemini 3 + per-provider docs - Add MODEL_CONTEXT_WINDOWS lookup (40+ models) with auto-resolution - Add resolve_context_window() helper (exact → prefix → suffix → 128K default) - Add context_window param to AgentLoopConfig, AgentLoop, create_agent - Inject context budget hint into system prompt - Wire CONTEXT_WINDOW env var in gateway server - Update README: - Add Gemini 3 Flash/Pro Preview models - Add per-provider configuration sections (Anthropic/OpenAI/DeepSeek/Gemini/OpenRouter) - Add Context Window documentation - Expand OpenRouter models table (21 models with pricing) - Add CONTEXT_WINDOW to env vars reference Co-authored-by: Cursor <cursoragent@cursor.com> * feat: E2E test with Gemini 3 Flash + fix SkillManager init - Add e2e_gemini3_test.py: 9 tests covering health, agent status, REST chat (stream/non-stream), session persistence, WS chat, WS session ops — all passing against live gateway with Gemini 3 Flash - Fix SkillManager init: use inspect.signature to check include_default_paths support at runtime (cross-version compat) Co-authored-by: Cursor <cursoragent@cursor.com> * refactor: consolidate test files (22 → 16) Merge 10 small test files into 4 logical groups: - test_security.py (shell + path security) - test_agent_unit.py (dynamic tools, document, perf) - e2e_gateway.py (basic + gemini E2E) - test_integration_live.py (crypto, scenarios, agent real) All 124 merged tests pass. No regressions. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: lifecycle, security & session hardening for PR#15 review - web.py: add close_shared_http_client() for explicit AsyncClient shutdown - gateway/app.py, gateway/server.py: call close on lifespan shutdown - shell.py: block newline/CR multi-command injection bypass - session/manager.py: add RLock thread-safety + max_cached_sessions eviction - tests/test_security.py: add newline injection regression tests - tests/test_session_persistence.py: add cache capacity boundary tests --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: OpenClaw Bot <bot@openclaw.ai>

veithly · 2026-03-09T02:49:20Z

The voice recognition feature (feat/voice-recognition-v2) was already merged into the dev branch and is now part of master via the dev→master merge (commit 04ca701). Closing this PR as redundant.

veithly and others added 17 commits February 11, 2026 15:42

Merge branch 'master' into feat/voice-recognition-v2

31c77ca

veithly closed this Mar 9, 2026

veithly deleted the feat/voice-recognition-v2 branch March 9, 2026 03:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): add voice recognition service and audio pipeline#16

feat(voice): add voice recognition service and audio pipeline#16
veithly wants to merge 18 commits intomasterfrom
feat/voice-recognition-v2

veithly commented Feb 13, 2026

Uh oh!

veithly commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

veithly commented Feb 13, 2026

Summary

New Files

Modified Files

Test Plan

Uh oh!

veithly commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant