Complete guide for deploying OpenTranscribe on air-gapped systems with no internet access.
- Overview
- System Requirements
- Building the Offline Package
- Installing on Air-Gapped System
- Configuration
- Usage
- Troubleshooting
- Maintenance
- Uninstallation
The OpenTranscribe offline package provides a complete, self-contained deployment solution for air-gapped environments. The package includes:
- All Docker container images
- Pre-downloaded AI models (~40GB)
- WhisperX transcription models (large-v3-turbo default + large-v3)
- PyAnnote speaker diarization models
- OpenSearch neural search models (semantic search embeddings)
- Word-level timestamps natively supported for all 100+ languages
- Configuration files and templates
- Installation and management scripts
- Complete documentation
Package Size: 15-20GB compressed (tar.xz), ~65GB uncompressed (compression optional)
Required to create the offline package:
- Ubuntu 20.04+ or similar Linux distribution
- Docker 20.10+
- Docker Compose v2+
- 100GB free disk space
- Fast internet connection
- HuggingFace account and token
System where OpenTranscribe will be installed:
- Ubuntu 20.04+ (recommended) or compatible Linux distribution
- Docker 20.10 or later
- Docker Compose v2+
- NVIDIA GPU with CUDA support (recommended)
- Minimum: 8GB VRAM
- Recommended: 16GB+ VRAM
- NVIDIA GPU drivers (470.x or later)
- NVIDIA Container Toolkit
- 80GB free disk space
- 16GB RAM minimum (32GB recommended)
- CPU: 4+ cores recommended
Note: OpenTranscribe can run without a GPU in CPU mode, but transcription will be significantly slower.
-
Set HuggingFace Token:
export HUGGINGFACE_TOKEN=your_token_hereGet your token at: https://huggingface.co/settings/tokens
-
Clone Repository:
git clone https://github.com/davidamacey/opentranscribe.git cd opentranscribe
-
Run the Build Script:
./scripts/build-offline-package.sh
The script will:
- Validate system requirements
- Pull all required Docker images from DockerHub
- Download AI models using your HuggingFace token
- Package configuration files and scripts
- Prompt for compression (optional - see below)
- Create integrity checksums
-
Compression Options:
At the end of the build, you'll be prompted:
Do you want to compress the package now? (y/n):Option 1: Compress Now (recommended for transfer)
- Takes 30-60 minutes using all CPU threads
- Creates
.tar.xzfile (15-20GB) - Best for network transfer or USB
Option 2: Skip Compression
- Saves time if testing locally
- Leaves uncompressed directory (~65GB)
- Can compress manually later if needed
- Useful for fast local network transfers
-
Build Duration:
- Image pulling: 10-20 minutes
- Model downloading: 30-60 minutes
- Compression: 30-60 minutes (if selected)
- Total: 1-2 hours (with compression)
- Total: 30-90 minutes (without compression)
-
Output:
If compressed:
offline-package-build/ ├── opentranscribe-offline-v{version}.tar.xz (~15-20GB) └── opentranscribe-offline-v{version}.tar.xz.sha256If uncompressed:
offline-package-build/ ├── opentranscribe-offline-v{version}/ (~65GB directory) └── opentranscribe-offline-v{version}.sha256 -
Verify Package:
If compressed:
cd offline-package-build sha256sum -c opentranscribe-offline-v*.tar.xz.sha256
If uncompressed:
cd offline-package-build # Checksums are stored in the .sha256 file for individual verification
-
Manual Compression (Optional):
If you skipped compression, you can compress later:
cd offline-package-build tar -cf - opentranscribe-offline-v* | xz -9 -T0 > opentranscribe-offline-v{version}.tar.xz sha256sum opentranscribe-offline-v*.tar.xz > opentranscribe-offline-v*.tar.xz.sha256
Transfer the package to your air-gapped system using your preferred method:
If compressed:
- Transfer both
.tar.xzand.tar.xz.sha256files - USB drive, network transfer, or physical media
If uncompressed:
- Transfer entire directory or compress first (see manual compression above)
- For directory transfer: use rsync, network share, or external drive
-
Install Docker (if not already installed):
# Follow Docker's official installation guide for your distribution # https://docs.docker.com/engine/install/ubuntu/
-
Install NVIDIA Drivers and Container Toolkit (for GPU support):
# Install NVIDIA drivers (if not already installed) ubuntu-drivers devices sudo ubuntu-drivers autoinstall # Install NVIDIA Container Toolkit # https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
-
Verify GPU Setup:
nvidia-smi docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
-
Extract Package (if compressed):
If you have a compressed
.tar.xzfile:tar -xf opentranscribe-offline-v*.tar.xz cd opentranscribe-offline-v*/
If you transferred the uncompressed directory:
cd opentranscribe-offline-v*/
-
Run Installer:
sudo ./install.sh
The installer will:
- Validate system requirements
- Verify package integrity
- Load Docker images (15-30 minutes)
- Install files to
/opt/opentranscribe/ - Copy AI models (10-20 minutes)
- Create configuration file
- Set proper permissions
-
Installation Duration:
- System validation: 1-2 minutes
- Docker image loading: 15-30 minutes
- Model installation: 10-20 minutes
- Total: 30-60 minutes
-
Post-Installation: The installer will display next steps when complete.
-
Edit Environment File:
sudo nano /opt/opentranscribe/.env
-
Set HuggingFace Token (REQUIRED):
HUGGINGFACE_TOKEN=your_token_here
Important: Speaker diarization requires a HuggingFace token. Get one at https://huggingface.co/settings/tokens
The .env file contains auto-detected settings. You may customize:
Security Settings:
POSTGRES_PASSWORD- Database password (auto-generated)MINIO_ROOT_PASSWORD- Object storage password (auto-generated)JWT_SECRET_KEY- JWT signing key (auto-generated)
AI Model Settings:
WHISPER_MODEL- Transcription model size (default: large-v3-turbo)- Options: tiny, base, small, medium, large-v1, large-v2, large-v3, large-v3-turbo
- Note:
large-v3-turbois 6x faster thanlarge-v3with excellent accuracy for English and most languages. Uselarge-v3for translation tasks or maximum accuracy on low-resource languages.
BATCH_SIZE- Processing batch size (default: 16)MIN_SPEAKERS/MAX_SPEAKERS- Speaker detection range (default: 1-20, can be increased to 50+ for large events)
Hardware Settings (auto-detected):
USE_GPU- Enable GPU accelerationTORCH_DEVICE- Device type (cuda/cpu)COMPUTE_TYPE- Precision (float16/int8)GPU_DEVICE_ID- GPU to use (default: 0)
LLM Integration (optional): For AI summarization and speaker identification features:
LLM_PROVIDER- Provider (openai, anthropic, openrouter)- Provider-specific API keys and settings
Note: LLM features require internet access. Leave LLM_PROVIDER empty for offline transcription-only mode.
Neural Search Settings:
- OpenSearch neural search is included in the offline package with pre-downloaded embedding models
- Uses
sentence-transformers/all-MiniLM-L6-v2for semantic search (~80MB model) - Supports both full-text and neural/semantic search queries
- No additional configuration needed for offline neural search support
- Requires 2GB additional memory per OpenSearch container for embeddings
Default ports (configurable in .env):
- Frontend:
80 - Backend API:
8080 - Flower (task monitor):
5555 - Database:
5432 - Redis:
6379 - MinIO:
9000 - MinIO Console:
9001 - OpenSearch:
9200
cd /opt/opentranscribe
sudo ./opentr.sh startAccess the application: http://localhost:80
All commands run from /opt/opentranscribe/:
Basic Operations:
sudo ./opentr.sh start # Start all services
sudo ./opentr.sh stop # Stop all services
sudo ./opentr.sh restart # Restart all services
sudo ./opentr.sh status # Show service status
sudo ./opentr.sh logs # View all logs (Ctrl+C to exit)
sudo ./opentr.sh logs backend # View specific service logsService Management:
sudo ./opentr.sh restart-backend # Restart backend services only
sudo ./opentr.sh restart-frontend # Restart frontend only
sudo ./opentr.sh shell backend # Open shell in backend containerMaintenance:
sudo ./opentr.sh health # Check health of all services
sudo ./opentr.sh backup # Create database backup
sudo ./opentr.sh clean # Clean up Docker resources-
Start OpenTranscribe:
cd /opt/opentranscribe sudo ./opentr.sh start -
Wait for services to start (~30 seconds):
sudo ./opentr.sh health
-
Access web interface: http://localhost:80
-
Create your first user account through the web interface
-
Upload an audio or video file to test transcription
Service Status:
sudo ./opentr.sh statusTask Monitoring: Access Flower dashboard at: http://localhost:5555/flower
Logs:
# All services
sudo ./opentr.sh logs
# Specific service
sudo ./opentr.sh logs celery-worker
# Follow logs in real-time
sudo ./opentr.sh logs -f backendCheck Docker status:
sudo systemctl status docker
sudo systemctl start dockerCheck service logs:
cd /opt/opentranscribe
sudo ./opentr.sh logsCheck service health:
sudo ./opentr.sh healthVerify NVIDIA drivers:
nvidia-smiVerify Container Toolkit:
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smiCheck configuration:
grep USE_GPU /opt/opentranscribe/.envManual GPU enable:
Edit /opt/opentranscribe/.env:
USE_GPU=true
TORCH_DEVICE=cuda
COMPUTE_TYPE=float16Then restart:
sudo ./opentr.sh restartCheck HuggingFace token:
grep HUGGINGFACE_TOKEN /opt/opentranscribe/.envCheck worker logs:
sudo ./opentr.sh logs celery-workerCheck Flower dashboard: http://localhost:5555/flower
For systems with limited VRAM:
Edit /opt/opentranscribe/.env:
WHISPER_MODEL=medium # or small, base
BATCH_SIZE=8 # reduce from 16Restart services:
sudo ./opentr.sh restart-backendCheck database status:
sudo ./opentr.sh logs postgresAccess database shell:
sudo ./opentr.sh shell postgres
psql -U postgres -d opentranscribeIf default ports are in use, edit /opt/opentranscribe/.env:
FRONTEND_PORT=8080 # Change from 80
BACKEND_PORT=8081 # Change from 8080
# etc.Restart:
sudo ./opentr.sh restartCPU Mode: Transcription in CPU mode is 10-50x slower than GPU mode.
GPU Optimization:
- Use
COMPUTE_TYPE=float16for NVIDIA GPUs - Increase
BATCH_SIZEif you have >16GB VRAM - Use
large-v3-turbo(default) for balanced speed/accuracy (requires 6GB+ VRAM) - Use
large-v3for maximum accuracy or translation tasks (requires 10GB+ VRAM)
System Resources:
- Monitor with:
docker stats - Increase RAM allocation if needed
- Close other GPU-intensive applications
Create backup:
sudo ./opentr.sh backupBackups stored in: /opt/opentranscribe/backups/
Restore backup:
cd /opt/opentranscribe
sudo ./opentr.sh stop
docker compose -f docker-compose.yml -f docker-compose.offline.yml run --rm postgres psql -U postgres -d opentranscribe < backups/backup_file.sql
sudo ./opentr.sh startFor offline systems, updates require a new offline package:
- Build new package on internet-connected system
- Transfer to air-gapped system
- Stop OpenTranscribe:
sudo ./opentr.sh stop - Backup data:
sudo ./opentr.sh backup - Extract new package and run installer
- Restore data if needed
View disk usage:
docker system dfClean old logs:
sudo ./opentr.sh cleanRotate logs: Docker automatically rotates logs, but you can manually clean:
docker system prune -aTo update AI models, you need internet access or a new model package:
- Stop services:
sudo ./opentr.sh stop - Replace model files in
/opt/opentranscribe/models/ - Start services:
sudo ./opentr.sh start
Run the uninstall script:
cd /opt/opentranscribe
sudo ./uninstall.shThe uninstall script will:
- Offer to create a database backup before removal
- Stop all OpenTranscribe services
- Remove Docker volumes (with confirmation)
- Optionally remove Docker images
- Remove the installation directory
/opt/opentranscribe/ - Optionally clean up unused Docker resources
This is the safest and most complete way to uninstall OpenTranscribe.
If you prefer to uninstall manually or the script is unavailable:
Stop and remove services:
cd /opt/opentranscribe
sudo ./opentr.sh stop
sudo docker compose -f docker-compose.yml -f docker-compose.offline.yml down -vRemove installation:
sudo rm -rf /opt/opentranscribeRemove Docker images (optional):
docker rmi davidamacey/opentranscribe-backend:latest
docker rmi davidamacey/opentranscribe-frontend:latest
docker rmi postgres:17.5-alpine redis:8.2.2-alpine3.22
docker rmi minio/minio:RELEASE.2025-09-07T16-13-09Z
docker rmi opensearchproject/opensearch:3.4.0Clean Docker system:
docker system prune -a
docker volume prune- Installation:
/opt/opentranscribe/ - Configuration:
/opt/opentranscribe/.env - Database data: Docker volume
opentranscribe_postgres_data - Object storage: Docker volume
opentranscribe_minio_data - AI models:
/opt/opentranscribe/models/ - Temp files:
/opt/opentranscribe/temp/ - Backups:
/opt/opentranscribe/backups/
Frontend (NGINX + Svelte)
↓
Backend (FastAPI)
↓
├── PostgreSQL (Database)
├── MinIO (Object Storage)
├── Redis (Message Queue)
├── OpenSearch (Search Engine)
└── Celery Worker (AI Processing)
↓
AI Models (WhisperX, PyAnnote)
For issues and questions:
- GitHub Issues: https://github.com/davidamacey/opentranscribe/issues
- Documentation: https://github.com/davidamacey/opentranscribe
OpenTranscribe is open source software. See LICENSE file for details.
Last Updated: October 2024 Version: 2.0 Offline