NVIDIA GPUs are not supported on modern macOS systems (macOS 10.14 Mojave and later). Apple deprecated NVIDIA GPU support in 2018.
However, this guide covers:
- Apple Silicon (M1/M2/M3) GPU acceleration - Native support
- Intel Mac with AMD GPU - Limited support via ROCm
- Backend-only deployment - Run services on Linux/Windows with GPU
Apple Silicon Macs have integrated GPUs that can be used for machine learning workloads through Metal Performance Shaders (MPS).
Performance:
- ⚡ 5-10x faster than CPU-only
- 🔋 Power efficient (M-series chips excel at ML tasks)
- 📦 Native support in PyTorch and TensorFlow
Limitations:
- Not as fast as dedicated NVIDIA GPUs
- Docker GPU support is experimental/limited
- Best performance with native Python (not Docker)
AMD GPUs on Intel Macs have very limited support for ML workloads:
- ROCm (AMD's CUDA alternative) is not officially supported on macOS
- Docker does not support AMD GPU passthrough on macOS
- Best option: CPU-only or remote GPU server
Run the frontend on Mac, but connect to a backend on Linux/Windows with NVIDIA GPU.
- macOS 12.3 or later (for MPS support)
- Apple Silicon (M1, M1 Pro, M1 Max, M1 Ultra, M2, M2 Pro, M2 Max, M2 Ultra, M3, M3 Pro, M3 Max)
- Xcode Command Line Tools
- Homebrew (package manager)
# Check if you have Apple Silicon
uname -m
# Should output: arm64
# Check macOS version
sw_vers
# ProductVersion should be 12.3 or higher/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"# Install Python via Homebrew
brew install python@3.10
# Verify installation
python3 --version# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install PyTorch with MPS support (Apple Silicon)
pip install torch torchvision torchaudio
# Verify MPS is available
python3 -c "import torch; print(f'MPS available: {torch.backends.mps.is_available()}')"Expected output:
MPS available: True
cd ~/Transcription-Translation1/backend
# Install dependencies
pip install -r requirements.txt
pip install -r gateway/requirements.txt
pip install -r stt_worker/requirements.txt
pip install -r translation_worker/requirements.txt# Install Redis via Homebrew
brew install redis
# Start Redis
brew services start redis
# Verify Redis is running
redis-cli ping
# Should output: PONGCreate backend/infra/.env from the example:
cd backend/infra
cp ../.env_example .envEdit .env for Apple Silicon:
# === Apple Silicon GPU Configuration ===
# Note: MPS (Metal Performance Shaders) is used instead of CUDA
# STT Worker - Use MPS device
DEVICE=mps
MODEL_SIZE=medium # Start with medium, adjust based on performance
# Translation Worker - Use MPS
FORCE_CPU=false
NLLB_MODEL=facebook/nllb-200-distilled-600M
# Redis
REDIS_URL=redis://localhost:6379The worker code needs to be updated to support MPS. Create a patch file:
For STT Worker (backend/stt_worker/config.py):
# Detect device (add MPS support)
if torch.backends.mps.is_available() and DEVICE == "mps":
DEVICE = "mps"
print("Using Apple Silicon GPU (MPS)")
elif torch.cuda.is_available() and DEVICE == "cuda":
DEVICE = "cuda"
print(f"Using CUDA GPU: {torch.cuda.get_device_name(0)}")
else:
DEVICE = "cpu"
print("Using CPU")For Translation Worker (backend/translation_worker/config.py):
# Device configuration with MPS support
if torch.backends.mps.is_available() and not FORCE_CPU:
DEVICE = "mps"
print("Using Apple Silicon GPU (MPS)")
elif torch.cuda.is_available() and not FORCE_CPU:
DEVICE = "cuda"
print(f"Using CUDA GPU")
else:
DEVICE = "cpu"
print("Using CPU (FORCE_CPU=True or no GPU available)")Note: Docker Desktop for Mac does not support GPU passthrough well. For best performance on Apple Silicon, run services natively without Docker.
cd backend
# Terminal 1: Start Gateway
source venv/bin/activate
python -m gateway.gateway
# Terminal 2: Start STT Worker
source venv/bin/activate
python -m stt_worker.worker
# Terminal 3: Start Translation Worker
source venv/bin/activate
python -m translation_worker.workerOr use the provided script:
cd backend
./run-services.sh devIf you must use Docker on Mac:
- Download Docker Desktop for Mac from docker.com/products/docker-desktop
- Install and start Docker Desktop
- Note: GPU acceleration is not available in Docker on Mac
Edit backend/infra/.env:
# === Docker CPU-Only Configuration ===
DEVICE=cpu
FORCE_CPU=true
MODEL_SIZE=small # Use smaller models for CPUcd backend/infra
docker compose up --buildPerformance Note: CPU-only Docker on Mac will be significantly slower than native Python with MPS.
The best option for Mac users is to run the backend on a remote Linux/Windows server with NVIDIA GPU and connect to it from your Mac.
┌─────────────────────────────────┐
│ Mac (Frontend Only) │
│ - Electron App │
│ - WebSocket Client │
└─────────────────────────────────┘
│
│ WebSocket over Internet
│
▼
┌─────────────────────────────────┐
│ Linux/Windows Server (Backend) │
│ - Gateway Service │
│ - STT Worker (NVIDIA GPU) │
│ - Translation Worker (GPU) │
│ - Redis │
└─────────────────────────────────┘
Follow the appropriate GPU setup guide:
- Windows: GPU_SETUP_WINDOWS.md
- Linux: GPU_SETUP_LINUX.md
On the remote server, edit backend/infra/.env:
# Allow connections from any IP
GATEWAY_PORT=5026
# Optional: Add authentication/SSLFrontend Configuration:
Edit frontend configuration to connect to remote server:
// frontend/lib/config.ts or environment variable
const GATEWAY_URL = "ws://your-server-ip:5026"Security Note: Use a VPN or SSH tunnel for secure connections:
# SSH tunnel to remote server
ssh -L 5026:localhost:5026 user@remote-server
# Now connect to ws://localhost:5026 from your Mac| Model | Device | Speed (RTF*) | Latency | Power |
|---|---|---|---|---|
| Whisper large-v3 | M1 Pro (CPU) | 2.0x | ~4000ms | High |
| Whisper large-v3 | M1 Pro (MPS) | 0.4x | ~800ms | Low |
| Whisper large-v3 | M2 Max (MPS) | 0.3x | ~600ms | Low |
| Whisper large-v3 | RTX 4090 (CUDA) | 0.08x | ~160ms | Very High |
| NLLB-600M | M1 Pro (MPS) | - | ~80ms | Low |
| NLLB-600M | RTX 4090 (CUDA) | - | ~40ms | Very High |
*RTF = Real-Time Factor (lower is better; 1.0 = real-time)
Summary:
- ⚡ Apple Silicon MPS: 5-10x faster than CPU, 40% of NVIDIA GPU speed
- 🔋 Apple Silicon: Much more power efficient
- 🚀 NVIDIA GPU: Fastest, but requires separate machine
Check:
python3 -c "import torch; print(f'MPS available: {torch.backends.mps.is_available()}')"If False:
- Ensure you have macOS 12.3 or later
- Ensure you're on Apple Silicon (M1/M2/M3)
- Reinstall PyTorch:
pip install --upgrade torch
Error: RuntimeError: MPS backend out of memory
Solution:
- Use smaller models:
MODEL_SIZE=smallorMODEL_SIZE=base - Close other applications to free memory
- Restart your Mac
This is expected. Docker Desktop for Mac does not support GPU passthrough.
Solution: Run services natively (see "Running Services Locally" above)
Check if MPS is actually being used:
# In your service logs, look for:
# "Using Apple Silicon GPU (MPS)"
# NOT: "Using CPU"Common causes:
- Code not updated to use MPS (see configuration section)
- Model fallback to CPU due to compatibility issues
- Too large model for available memory
MODEL_SIZE=small
NLLB_MODEL=facebook/nllb-200-distilled-600MMODEL_SIZE=medium
NLLB_MODEL=facebook/nllb-200-distilled-600MMODEL_SIZE=large-v3
NLLB_MODEL=facebook/nllb-200-1.3BSame as M1/M2, but expect slightly better performance.
# Check Apple Silicon
uname -m # Should output: arm64
# Check MPS availability
python3 -c "import torch; print(f'MPS: {torch.backends.mps.is_available()}')"
# Start Redis
brew services start redis
# Check Redis
redis-cli ping
# Start services natively
cd backend
source venv/bin/activate
./run-services.sh dev
# Or start individually
python -m gateway.gateway &
python -m stt_worker.worker &
python -m translation_worker.worker &
# Monitor resource usage
top -pid $(pgrep Python)- PyTorch MPS Documentation
- Apple Metal Performance Shaders
- Faster-Whisper on Apple Silicon
- Docker Desktop for Mac
For Mac Users:
-
Best Option: Apple Silicon with native Python + MPS ✅
- Good performance (5-10x faster than CPU)
- Power efficient
- Runs locally
-
Second Best: Remote GPU Server (Linux/Windows) ✅
- Best performance (NVIDIA GPU)
- Mac runs frontend only
- Requires network connection
-
Fallback: CPU-Only Mode
⚠️ - Works but slow
- Use smaller models
- Acceptable for development/testing
Note: The frontend currently only works on Windows due to native keyboard integration. If you're developing on Mac, you're likely working on the backend services only.
Need Help? Open an issue on GitHub with:
- Your Mac model and chip (M1/M2/M3)
- macOS version (
sw_vers) - Output of MPS check command
- Service logs showing errors