feat: Implement native Apple Silicon transcription with MLX-Whisper or whisper.cpp

## Summary
OpenTranscribe currently uses WhisperX for transcription, which doesn't fully utilize Apple Silicon's GPU capabilities (falls back to CPU on MPS devices). Implementing a native Apple Silicon solution using MLX-Whisper or whisper.cpp would provide significant performance improvements for macOS users with M1/M2/M3 chips.

## Current State
- WhisperX is configured to use CPU on Apple Silicon (MPS) devices due to compatibility issues
- Performance is suboptimal compared to native implementations
- Users with expensive Apple Silicon hardware aren't getting full value from their GPU

## Proposed Solutions

### Option 1: MLX-Whisper (Recommended)
**Pros:**
- 30-40% faster than whisper.cpp on Apple Silicon
- Native MLX framework designed specifically for Apple Silicon
- Excellent GPU utilization on M1/M2/M3 chips
- Python-based, easier integration with existing codebase
- Active development and Apple support

**Cons:**
- Only works on Apple Silicon (need to maintain WhisperX for other platforms)
- Smaller community compared to whisper.cpp
- May require significant refactoring of transcription service

### Option 2: Lightning-Whisper-MLX
**Pros:**
- Claims 10x faster than whisper.cpp
- 4x faster than standard MLX-Whisper
- Optimized specifically for Apple Silicon
- Best-in-class performance for macOS

**Cons:**
- Very new project, stability concerns
- Limited documentation
- May lack features compared to WhisperX

### Option 3: whisper.cpp
**Pros:**
- Cross-platform (works on Apple Silicon, CUDA, CPU)
- Very mature and stable
- 6-7x faster than vanilla Whisper on CPU
- Good Apple Silicon support via Metal
- Could potentially replace WhisperX entirely

**Cons:**
- Requires C++ integration (more complex)
- 30-40% slower than MLX-Whisper on Apple Silicon
- Would need Python bindings or subprocess calls

## Performance Benchmarks (2024)
Based on recent benchmarks:
- **MLX-Whisper**: ~50% faster than vanilla Whisper on Apple Silicon
- **Lightning-Whisper-MLX**: 10x faster than whisper.cpp (claimed)
- **whisper.cpp**: 6-7x faster than vanilla Whisper on CPU, good Metal support

## Implementation Plan

### Phase 1: Research & Prototype
- [ ] Benchmark MLX-Whisper vs whisper.cpp on M1/M2/M3 hardware
- [ ] Test feature parity with WhisperX (timestamps, speaker alignment)
- [ ] Evaluate integration complexity for each option
- [ ] Create proof-of-concept implementation

### Phase 2: Architecture Design
- [ ] Design abstraction layer for multiple transcription backends
- [ ] Create platform detection logic (Apple Silicon vs CUDA vs CPU)
- [ ] Plan migration strategy from WhisperX
- [ ] Design configuration system for backend selection

### Phase 3: Implementation
- [ ] Implement chosen solution (likely MLX-Whisper)
- [ ] Create fallback mechanism to WhisperX for non-Apple platforms
- [ ] Update Docker configurations for Apple Silicon
- [ ] Implement proper error handling and logging

### Phase 4: Testing & Optimization
- [ ] Comprehensive testing on M1, M2, M3 hardware
- [ ] Performance benchmarking vs current WhisperX implementation
- [ ] Memory usage optimization
- [ ] Edge case testing (long files, multiple speakers)

### Phase 5: Documentation & Deployment
- [ ] Update documentation for Apple Silicon users
- [ ] Create migration guide
- [ ] Update setup scripts for macOS
- [ ] Release with clear performance expectations

## Technical Requirements

### Core Features to Maintain
- Word-level timestamps
- Speaker diarization compatibility
- Multiple language support
- Batch processing capability
- Progress callbacks for UI updates

### New Requirements
- Automatic backend selection based on hardware
- Configurable backend via environment variables
- Performance metrics logging
- Graceful fallback on errors

## Code Changes Required

### Backend Changes
```
backend/app/tasks/transcription/
├── base_transcriber.py          # New: Abstract base class
├── whisperx_transcriber.py      # Refactored from whisperx_service.py
├── mlx_transcriber.py           # New: MLX-Whisper implementation
├── whisper_cpp_transcriber.py   # New: Optional whisper.cpp implementation
└── transcriber_factory.py       # New: Factory for backend selection
```

### Configuration Updates
- Add `TRANSCRIPTION_BACKEND` environment variable
- Add `APPLE_SILICON_OPTIMIZATION` flag
- Update hardware detection to identify MLX availability
- Add backend-specific configuration options

### Docker Updates
- Create Apple Silicon specific Dockerfile variant
- Update docker-compose with platform-specific service definitions
- Ensure proper MLX installation in containers

## Acceptance Criteria
- [ ] 2x or better performance improvement on Apple Silicon vs current implementation
- [ ] No regression in transcription quality
- [ ] Seamless fallback to WhisperX on non-Apple hardware
- [ ] All existing features continue to work
- [ ] Clear documentation for users
- [ ] Automated backend selection based on hardware

## Performance Targets
- M1 Pro: < 5 minutes for 1-hour audio (currently ~10 minutes)
- M2/M3: < 4 minutes for 1-hour audio
- Memory usage: < 8GB for large model
- GPU utilization: > 80% during transcription

## Related Issues
- #38 - Docker containers fail to start on ARM64/Apple Silicon Macs (resolved)
- #25 - Cross-Platform PyTorch Model Support

## References
- [MLX-Whisper GitHub](https://github.com/ml-explore/mlx-examples/tree/main/whisper)
- [Lightning-Whisper-MLX](https://github.com/mustafaaljadery/lightning-whisper-mlx)
- [whisper.cpp](https://github.com/ggml-org/whisper.cpp)
- [Performance comparison article](https://owehrens.com/whisper-nvidia-rtx-4090-vs-m1pro-with-mlx/)

## Labels
enhancement, performance, macos, apple-silicon, transcription, mlx

## Priority
High - Significant performance improvement for growing Apple Silicon user base

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement native Apple Silicon transcription with MLX-Whisper or whisper.cpp #48

Summary

Current State

Proposed Solutions

Option 1: MLX-Whisper (Recommended)

Option 2: Lightning-Whisper-MLX

Option 3: whisper.cpp

Performance Benchmarks (2024)

Implementation Plan

Phase 1: Research & Prototype

Phase 2: Architecture Design

Phase 3: Implementation

Phase 4: Testing & Optimization

Phase 5: Documentation & Deployment

Technical Requirements

Core Features to Maintain

New Requirements

Code Changes Required

Backend Changes

Configuration Updates

Docker Updates

Acceptance Criteria

Performance Targets

Related Issues

References

Labels

Priority

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: Implement native Apple Silicon transcription with MLX-Whisper or whisper.cpp #48

Description

Summary

Current State

Proposed Solutions

Option 1: MLX-Whisper (Recommended)

Option 2: Lightning-Whisper-MLX

Option 3: whisper.cpp

Performance Benchmarks (2024)

Implementation Plan

Phase 1: Research & Prototype

Phase 2: Architecture Design

Phase 3: Implementation

Phase 4: Testing & Optimization

Phase 5: Documentation & Deployment

Technical Requirements

Core Features to Maintain

New Requirements

Code Changes Required

Backend Changes

Configuration Updates

Docker Updates

Acceptance Criteria

Performance Targets

Related Issues

References

Labels

Priority

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions