Optimize Whisper transcription performance for large audio files

## Observation
During testing with an 18MB audio file (~19 minutes), observed:
- Transcription started at ~60 frames/sec
- Degraded to ~19 frames/sec at 75% completion
- Total processing time: ~2 hours for 19-minute audio
- Model: Whisper large-v3 (1.5B parameters)

## Performance Data
- Model download: 2.88GB (one-time)
- Audio file: 18MB m4a
- Total frames: 286,596
- Speed degradation: 60 fps → 19 fps

## Potential Optimizations

### 1. Model Selection
- Current: `large-v3` (best accuracy, slowest)
- Consider: `medium` (769M params, good accuracy, faster)
- Option: Make model configurable per-request

### 2. Hardware Acceleration
- Check if CUDA/GPU acceleration is available
- Verify `WHISPER_DEVICE=auto` is selecting optimal device
- Consider batch processing optimizations

### 3. Memory Management
- Investigate memory usage patterns
- Check for memory leaks causing slowdown
- Profile CPU/memory usage during transcription

### 4. Chunking Strategy
- Evaluate if audio chunking could improve performance
- Consider parallel processing of chunks
- Balance between accuracy and speed

### 5. Caching
- Cache frequently used model weights
- Implement audio preprocessing cache
- Store intermediate results for retries

## Investigation Needed

1. Profile transcription to identify bottleneck
2. Monitor resource usage during processing
3. Test with different model sizes
4. Check GPU availability and utilization

## Success Criteria
- Identify root cause of performance degradation
- Document trade-offs between model sizes
- Provide configuration options for performance tuning
- Target: <30 minutes for 19-minute audio with medium model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Whisper transcription performance for large audio files #46

Observation

Performance Data

Potential Optimizations

1. Model Selection

2. Hardware Acceleration

3. Memory Management

4. Chunking Strategy

5. Caching

Investigation Needed

Success Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Optimize Whisper transcription performance for large audio files #46

Description

Observation

Performance Data

Potential Optimizations

1. Model Selection

2. Hardware Acceleration

3. Memory Management

4. Chunking Strategy

5. Caching

Investigation Needed

Success Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions