This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
NeuroVox is an Obsidian plugin that enhances note-taking with voice transcription and AI-powered insights. It supports multiple AI providers (OpenAI, Groq, Deepgram) for audio transcription and processing, with additional features for video-to-audio conversion and real-time recording.
- Voice recording with transcription via multiple AI services
- Video file processing (extracts audio for transcription)
- Streaming transcription with real-time chunk processing
- Smart note formatting with timestamps and organization
- Floating button and toolbar integration for quick access
- Real-time recording timer display
- Multiple output modes (new note, clipboard, current note)
- TypeScript (ES6 target, ESNext modules)
- Obsidian API (min version 0.15.0)
- Build Tool: esbuild with hot reload
- AI Services: OpenAI, Groq, Deepgram APIs
- Audio/Video: RecordRTC, FFmpeg.wasm
- State Management: Custom state system in utils/state
- Version: 0.3.1 (manifest.json)
-
AI Adapters (
src/adapters/)- Base
AIAdapterinterface for consistent API - Provider-specific implementations (OpenAI, Groq, Deepgram)
- Each adapter handles transcription and optional AI processing
- Base
-
Audio Processing (
src/utils/audio/)AudioProcessor: Handles recording and file managementRecordingProcessor: Orchestrates transcription workflowStreamingTranscriptionService: Real-time streaming transcription- Configurable bitrate and MIME type support
-
State Management (
src/utils/state/)StateManager: Global state trackingSegmentTracker: Tracks processing segments- Persistent state across operations
-
UI Components
FloatingButton: Draggable recording buttonTimerModal: Real-time recording interfaceRecordingUI: Recording controls and status- Settings tab with accordions for organization
-
Command System (
src/commands/)- Modular command registration
- Supports various input methods (recording, files, clipboard)
- Adapter Pattern: Unified interface for multiple AI providers
- Streaming Mode: All platforms use real-time streaming transcription with time-sliced chunks
- Event-Driven: Uses Obsidian's event system for UI updates
- Async/Await: Throughout for API calls and file operations
- Error Boundaries: Comprehensive error handling with user feedback
src/
├── adapters/ # AI service integrations
├── commands/ # Obsidian command implementations
├── main.ts # Plugin entry point
├── modals/ # UI modal components
├── prompts/ # AI prompt templates
├── settings/ # Plugin configuration
├── types.ts # TypeScript definitions
├── ui/ # UI components
└── utils/ # Utility modules
├── audio/ # Audio processing
├── document/ # Note manipulation
├── state/ # State management
└── transcription/ # Transcription coordination
# Development with hot reload
npm run dev
# Production build (includes TypeScript check)
npm run build
# Update version in manifest.json and versions.json
npm run version- Uses
esbuild.config.mjsfor bundling - Outputs to
main.jsin root directory - Includes banner for Obsidian compatibility
- Development mode includes sourcemaps and watch
- Whisper API for transcription
- GPT models for content processing
- Requires API key in settings
- Fast inference for transcription
- Whisper-large-v3 model
- Requires API key in settings
- Real-time transcription service
- Nova-2 model support
- Recently integrated (commit c6decb4)
- MediaRecorder API for audio capture
- Web Audio API for processing
- FileReader API for file handling
-
Streaming Transcription
- Uses real-time streaming mode on all platforms
- Audio is processed in small time-slice chunks (5-10 seconds)
- Memory-adaptive with configurable queue limits
-
Browser Compatibility
- Requires modern browser with MediaRecorder support
- WebAssembly support needed for FFmpeg
-
API Rate Limits
- Respect provider-specific rate limits
- Implements retry logic for transient failures
-
Security
- API keys stored in Obsidian settings
- No keys in code or version control
- Sensitive data handled securely
- Deepgram Integration: Added new transcription provider with API key management
- Enhanced Audio Processing: Configurable bitrate and MIME type support
- Chunked Processing: Improved handling of large audio files
- Version 0.3.1: Latest release with multiple improvements
-
Follow Existing Patterns
- Use adapter pattern for new AI providers
- Implement proper error handling with user feedback
- Maintain TypeScript types in types.ts
-
State Management
- Use StateManager for global state
- Clean up state after operations
- Handle edge cases (cancellation, errors)
-
UI Consistency
- Follow Obsidian's UI patterns
- Use modals for user interactions
- Provide clear status feedback
-
Performance
- Implement chunking for large files
- Use async operations for IO
- Clean up resources (audio contexts, etc.)
- Create adapter in
src/adapters/ - Implement AIAdapter interface
- Add settings in
src/settings/Settings.ts - Update SettingTab UI
- Register in TranscriptionService
- Check
src/utils/audio/for core logic - Update AudioProcessor for recording changes
- Modify StreamingTranscriptionService for streaming behavior
- Settings interface in
src/settings/Settings.ts - UI components in
src/settings/accordions/ - Settings saved via Obsidian's data API
Currently no formal test suite. Test manually by:
- Loading plugin in Obsidian development vault
- Testing each transcription provider
- Verifying chunk processing with large files
- Checking all output modes (note, clipboard, etc.)
- No automated tests yet
- Version mismatch between package.json (0.2.0) and manifest.json (0.3.1)
- Desktop and mobile support (desktop-only: false)
- Requires manual API key configuration for each provider
- Resume:
claude --resume 6dad1cd3-6fce-47ee-aaf9-f24ca6f285fc - Team:
pact-6dad1cd3 - Started: 2026-03-14 16:03:15 UTC