Summary
OpenTranscribe currently uses WhisperX for transcription, which doesn't fully utilize Apple Silicon's GPU capabilities (falls back to CPU on MPS devices). Implementing a native Apple Silicon solution using MLX-Whisper or whisper.cpp would provide significant performance improvements for macOS users with M1/M2/M3 chips.
Current State
- WhisperX is configured to use CPU on Apple Silicon (MPS) devices due to compatibility issues
- Performance is suboptimal compared to native implementations
- Users with expensive Apple Silicon hardware aren't getting full value from their GPU
Proposed Solutions
Option 1: MLX-Whisper (Recommended)
Pros:
- 30-40% faster than whisper.cpp on Apple Silicon
- Native MLX framework designed specifically for Apple Silicon
- Excellent GPU utilization on M1/M2/M3 chips
- Python-based, easier integration with existing codebase
- Active development and Apple support
Cons:
- Only works on Apple Silicon (need to maintain WhisperX for other platforms)
- Smaller community compared to whisper.cpp
- May require significant refactoring of transcription service
Option 2: Lightning-Whisper-MLX
Pros:
- Claims 10x faster than whisper.cpp
- 4x faster than standard MLX-Whisper
- Optimized specifically for Apple Silicon
- Best-in-class performance for macOS
Cons:
- Very new project, stability concerns
- Limited documentation
- May lack features compared to WhisperX
Option 3: whisper.cpp
Pros:
- Cross-platform (works on Apple Silicon, CUDA, CPU)
- Very mature and stable
- 6-7x faster than vanilla Whisper on CPU
- Good Apple Silicon support via Metal
- Could potentially replace WhisperX entirely
Cons:
- Requires C++ integration (more complex)
- 30-40% slower than MLX-Whisper on Apple Silicon
- Would need Python bindings or subprocess calls
Performance Benchmarks (2024)
Based on recent benchmarks:
- MLX-Whisper: ~50% faster than vanilla Whisper on Apple Silicon
- Lightning-Whisper-MLX: 10x faster than whisper.cpp (claimed)
- whisper.cpp: 6-7x faster than vanilla Whisper on CPU, good Metal support
Implementation Plan
Phase 1: Research & Prototype
Phase 2: Architecture Design
Phase 3: Implementation
Phase 4: Testing & Optimization
Phase 5: Documentation & Deployment
Technical Requirements
Core Features to Maintain
- Word-level timestamps
- Speaker diarization compatibility
- Multiple language support
- Batch processing capability
- Progress callbacks for UI updates
New Requirements
- Automatic backend selection based on hardware
- Configurable backend via environment variables
- Performance metrics logging
- Graceful fallback on errors
Code Changes Required
Backend Changes
backend/app/tasks/transcription/
├── base_transcriber.py # New: Abstract base class
├── whisperx_transcriber.py # Refactored from whisperx_service.py
├── mlx_transcriber.py # New: MLX-Whisper implementation
├── whisper_cpp_transcriber.py # New: Optional whisper.cpp implementation
└── transcriber_factory.py # New: Factory for backend selection
Configuration Updates
- Add
TRANSCRIPTION_BACKEND environment variable
- Add
APPLE_SILICON_OPTIMIZATION flag
- Update hardware detection to identify MLX availability
- Add backend-specific configuration options
Docker Updates
- Create Apple Silicon specific Dockerfile variant
- Update docker-compose with platform-specific service definitions
- Ensure proper MLX installation in containers
Acceptance Criteria
Performance Targets
- M1 Pro: < 5 minutes for 1-hour audio (currently ~10 minutes)
- M2/M3: < 4 minutes for 1-hour audio
- Memory usage: < 8GB for large model
- GPU utilization: > 80% during transcription
Related Issues
References
Labels
enhancement, performance, macos, apple-silicon, transcription, mlx
Priority
High - Significant performance improvement for growing Apple Silicon user base
Summary
OpenTranscribe currently uses WhisperX for transcription, which doesn't fully utilize Apple Silicon's GPU capabilities (falls back to CPU on MPS devices). Implementing a native Apple Silicon solution using MLX-Whisper or whisper.cpp would provide significant performance improvements for macOS users with M1/M2/M3 chips.
Current State
Proposed Solutions
Option 1: MLX-Whisper (Recommended)
Pros:
Cons:
Option 2: Lightning-Whisper-MLX
Pros:
Cons:
Option 3: whisper.cpp
Pros:
Cons:
Performance Benchmarks (2024)
Based on recent benchmarks:
Implementation Plan
Phase 1: Research & Prototype
Phase 2: Architecture Design
Phase 3: Implementation
Phase 4: Testing & Optimization
Phase 5: Documentation & Deployment
Technical Requirements
Core Features to Maintain
New Requirements
Code Changes Required
Backend Changes
Configuration Updates
TRANSCRIPTION_BACKENDenvironment variableAPPLE_SILICON_OPTIMIZATIONflagDocker Updates
Acceptance Criteria
Performance Targets
Related Issues
References
Labels
enhancement, performance, macos, apple-silicon, transcription, mlx
Priority
High - Significant performance improvement for growing Apple Silicon user base