This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
When developing and testing, you MUST use locally built Docker images, NOT Docker Hub images. The Docker Hub images contain older compiled code and will NOT include your local changes. This applies to BOTH frontend AND backend containers.
Before testing ANY code changes:
# Rebuild frontend from local code
docker build -t davidamacey/opentranscribe-frontend:latest -f frontend/Dockerfile.prod frontend/
# Rebuild backend from local code
docker build -t davidamacey/opentranscribe-backend:latest -f backend/Dockerfile.prod backend/
# Restart changed services
docker stop opentranscribe-frontend && docker rm opentranscribe-frontend
docker stop opentranscribe-backend && docker rm opentranscribe-backend
docker compose -f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.nginx.yml -f docker-compose.local.yml -f docker-compose.pki.yml up -d --no-deps --force-recreate frontend backendWhy this matters:
docker-compose.prod.ymluses the Docker Hub image tag (davidamacey/opentranscribe-*:latest)docker-compose.local.ymlsetspull_policy: neverbut does NOT mount local source code- The
docker-compose.override.yml(dev mode with Vite/hot-reload) is NOT loaded when using explicit-fflags - If you see 404 errors on API endpoints or missing UI features, the container is running old Docker Hub code
OpenTranscribe is a containerized AI-powered transcription application with these core services:
- Frontend: Svelte/TypeScript SPA with Progressive Web App capabilities
- Backend: FastAPI with async support and OpenAPI documentation
- Database: PostgreSQL with SQLAlchemy ORM and Alembic migrations
- Storage: MinIO S3-compatible object storage
- Search: OpenSearch 3.4.0 (Apache Lucene 10) for full-text and vector search
- Queue: Celery with Redis for background AI processing
- Monitoring: Flower for task monitoring
- AI Models: WhisperX for transcription (100+ languages), PyAnnote for speaker diarization
- Frontend: Svelte, TypeScript, Vite, Plyr for media playback, i18n (7 UI languages)
- Backend: FastAPI, SQLAlchemy 2.0, Alembic, Celery
- Infrastructure: Docker Compose, NGINX for production
Use ./opentr.sh for all development operations:
# Start development environment
./opentr.sh start dev
# Stop all services
./opentr.sh stop
# View logs (all or specific service)
./opentr.sh logs [backend|frontend|postgres|celery-worker]
# Reset environment (WARNING: deletes all data)
./opentr.sh reset dev
# Check service status
./opentr.sh status
# Access container shell
./opentr.sh shell [backend|frontend|postgres]
# Database backup/restore
./opentr.sh backup
./opentr.sh restore backups/backup_file.sql
# Multi-GPU scaling (optional - for high-throughput systems)
./opentr.sh start dev --gpu-scale
./opentr.sh reset dev --gpu-scaleFor systems with multiple GPUs, you can enable parallel GPU workers to significantly increase transcription throughput.
Use Case: You have multiple GPUs and want to maximize processing speed by running multiple transcription workers in parallel.
Example Hardware Setup:
- GPU 0: NVIDIA RTX A6000 (49GB) - Running LLM model
- GPU 1: RTX 3080 Ti (12GB) - Default single worker (disabled when scaling)
- GPU 2: NVIDIA RTX A6000 (49GB) - Scaled workers (4 parallel)
Configuration (in .env):
GPU_SCALE_ENABLED=true # Enable multi-GPU scaling
GPU_SCALE_DEVICE_ID=2 # Which GPU to use (default: 2)
GPU_SCALE_WORKERS=4 # Number of parallel workers (default: 4)Usage:
# Start with GPU scaling enabled
./opentr.sh start dev --gpu-scale
# Reset with GPU scaling enabled
./opentr.sh reset dev --gpu-scale
# View scaled worker logs
docker compose logs -f celery-worker-gpu-scaledHow It Works:
- When
--gpu-scaleflag is used, the system loadsdocker-compose.gpu-scale.ymloverlay - Default single GPU worker is disabled (
scale: 0) - A new single container is created with
concurrency=4(configurable viaGPU_SCALE_WORKERS) - The container runs 4 parallel Celery workers within a single process
- All workers target the specified GPU device and process from the
gpuqueue - Celery automatically distributes tasks across the worker pool
Performance: With 4 parallel workers on a high-end GPU like the A6000, you can process 4 videos simultaneously, significantly reducing total processing time for batches of media files.
Scaling: Simply change GPU_SCALE_WORKERS in your .env file to adjust the number of concurrent workers (e.g., 2, 4, 6, 8) based on your GPU's memory and processing capacity.
OpenTranscribe supports multiple authentication methods. Use these commands to deploy with specific auth configurations:
PKI Certificate Authentication (Production Only):
# Production with PKI (test before push)
./opentr.sh start prod --build --with-pki
# Production with PKI (Docker Hub images)
./opentr.sh start prod --with-pki
# Access: https://localhost:5182
# Requires client certificate (.p12 files in scripts/pki/test-certs/clients/)
# Note: PKI requires nginx with mTLS and only works in production mode
# Dev mode uses Vite dev server which cannot handle client certificate verificationLDAP/Active Directory Testing:
# Development with LDAP test container
./opentr.sh start dev --with-ldap-test
# Production with LDAP test container (test before push)
./opentr.sh start prod --build --with-ldap-test
# LDAP server: localhost:3890
# Web UI: http://localhost:17170
# Admin: admin / admin_passwordKeycloak/OIDC Testing:
# Development with Keycloak test container
./opentr.sh start dev --with-keycloak-test
# Production with Keycloak test container (test before push)
./opentr.sh start prod --build --with-keycloak-test
# Keycloak URL: http://localhost:8180
# Admin credentials: admin / adminCombined Authentication Methods:
# LDAP + Keycloak testing (dev mode)
./opentr.sh start dev --with-ldap-test --with-keycloak-test
# All auth methods including PKI (production only)
./opentr.sh start prod --build --with-pki --with-ldap-test --with-keycloak-testConfiguration Notes:
- Configure auth methods via Admin UI: Settings → Authentication
- Database configuration takes precedence over
.envvariables - PKI requires nginx with mTLS (automatically enabled with
--with-pki) - LDAP/Keycloak test containers are for development only
- Production deployments should connect to organization's AD/Keycloak servers
Documentation:
- PKI Setup:
docs/PKI_SETUP.md - LDAP/AD Setup:
docs/LDAP_AUTH.md - Keycloak Setup:
docs/KEYCLOAK_SETUP.md
Build and push production Docker images to Docker Hub:
# Build and push both services (multi-arch: amd64 + arm64 via remote Mac Studio builder)
# ALWAYS use USE_REMOTE_BUILDER=true — without it ARM64 uses QEMU and takes 2-3 hours
USE_REMOTE_BUILDER=true ./scripts/docker-build-push.sh
# Build specific service only
USE_REMOTE_BUILDER=true ./scripts/docker-build-push.sh backend
USE_REMOTE_BUILDER=true ./scripts/docker-build-push.sh frontend
# Quick iteration (skip security scan — use when testing changes back-to-back)
USE_REMOTE_BUILDER=true SKIP_SECURITY_SCAN=true ./scripts/docker-build-push.sh backend
# Auto-detect changes and build only what changed
USE_REMOTE_BUILDER=true ./scripts/docker-build-push.sh auto
# Build for single platform (faster testing, no remote builder needed)
PLATFORMS=linux/amd64 ./scripts/docker-build-push.sh backendSee scripts/README.md for detailed documentation.
CRITICAL: Production/nginx testing always requires locally built images. The Docker Hub images contain older compiled code. When running with prod overlays (docker-compose.prod.yml, docker-compose.nginx.yml, docker-compose.pki.yml), the frontend serves pre-compiled static assets from the Docker image — NOT the local source code. You must rebuild locally after any frontend or backend code changes.
# Build backend image locally
docker build -t davidamacey/opentranscribe-backend:latest -f backend/Dockerfile.prod backend/
# Build frontend image locally
docker build -t davidamacey/opentranscribe-frontend:latest -f frontend/Dockerfile.prod frontend/
# Restart with local images (use docker-compose.local.yml to prevent pulling)
docker compose -f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.local.yml up -d --force-recreateWhen using PKI/nginx overlays, rebuild and recreate only the frontend:
docker build -t davidamacey/opentranscribe-frontend:latest -f frontend/Dockerfile.prod frontend/
docker stop opentranscribe-frontend && docker rm opentranscribe-frontend
docker compose -f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.nginx.yml -f docker-compose.local.yml -f docker-compose.pki.yml up -d --no-deps --force-recreate frontendImportant - Docker Build Caching:
- Docker uses content hashing for COPY layers - if file content hasn't changed, the layer is cached
- If code changes aren't being picked up, touch a file or add a comment to invalidate the cache:
echo "<!-- Build: $(date +%s) -->" >> frontend/src/components/SomeFile.svelte docker build -t davidamacey/opentranscribe-frontend:latest -f frontend/Dockerfile.prod frontend/
- Avoid
--no-cachefor routine builds - it rebuilds everything includingnpm ciwhich is slow - Only use
--no-cachewhen dependencies change or you suspect deep caching issues
Important - Local vs Docker Hub Images:
docker-compose.local.ymlsetspull_policy: neverto prevent overwriting local images with Docker Hub versions--force-recreateensures containers use the new image- Never use
./opentranscribe.sh updatewith local builds (it pulls from Docker Hub) - When testing local code changes, always use the local compose overlay
- The
docker-compose.override.yml(dev Vite server) is NOT loaded when using explicit-fflags — only the specified files are used
# From frontend/ directory
npm run dev # Start dev server
npm run build # Production build
npm run check # Type checkingSystem-wide Playwright browser automation for testing and debugging. Claude Code can use this to interact with the frontend, capture screenshots, see console errors, and automate UI testing.
Location: ~/bin/browser-tools/ (see ~/bin/browser-tools/README.md for full docs)
Setup (one-time):
sudo npm install -g playwright
npx playwright install chromium
cd ~/bin/browser-tools && npm installUsage:
# Basic - open URL, screenshot, check for errors
node ~/bin/browser-tools/browse.js http://localhost:5173
# With visible browser on XRDP (display :13)
node ~/bin/browser-tools/browse.js http://localhost:5173 --display=:13
# Login flow example
node ~/bin/browser-tools/browse.js http://localhost:5173 --display=:13 \
'fill:#email:admin@example.com' \
'fill:#password:password' \
'click:button[type=submit]' \
'wait:2000' \
'screenshot:after-login'Actions:
| Action | Description |
|---|---|
fill:<selector>:<value> |
Fill input field |
click:<selector> |
Click element |
screenshot:<name> |
Save screenshot |
wait:<ms> |
Wait milliseconds |
waitfor:<selector> |
Wait for element |
eval:<javascript> |
Execute JS, print result |
title |
Print page title |
text |
Print page text |
Options:
--display=:13- Show browser on XRDP display (user can watch)--keep- Keep browser open after actions--timeout=30000- Navigation timeout
Screenshots saved to: ~/bin/browser-tools/screenshots/
Test credentials: admin@example.com / password
End-to-end tests that verify frontend and backend work together through real browser automation.
Location: backend/tests/e2e/
Test Files:
conftest.py- Fixtures (login_page, authenticated_page, auth_helper, api_helper)test_login.py- Comprehensive login tests (~50 tests)test_registration.py- Comprehensive registration tests (~35 tests)test_auth_flow.py- Combined auth flow tests
Running E2E Tests:
# Activate venv first
source backend/venv/bin/activate
# Run all E2E tests (headless - fast)
pytest backend/tests/e2e/ -v
# Run with visible browser on XRDP (watch the tests)
DISPLAY=:11 pytest backend/tests/e2e/ -v --headed
# Run specific test file
pytest backend/tests/e2e/test_login.py -v
# Run specific test class
pytest backend/tests/e2e/test_login.py::TestLoginSuccess -v --headed
# Run with screenshots on failure
pytest backend/tests/e2e/ -v --screenshot only-on-failureTest Categories:
| Category | Tests | Coverage |
|---|---|---|
| Login | ~50 | Form validation, success/failure, security, session, UI |
| Registration | ~35 | All fields, username/email/password validation, duplicates |
| Auth Flow | ~15 | Login → use app → logout, session persistence |
Requirements:
- Dev environment running (
./opentr.sh start dev) - Frontend at
localhost:5173 - Backend at
localhost:5174 - pytest-playwright installed (
pip install pytest-playwright)
Fixtures Available:
login_page- Page navigated to login, ready for inputauthenticated_page- Already logged in as adminauth_helper- Helper for login/logout/register operationsapi_helper- Helper for backend API calls during tests
# From backend/ directory (or via container)
uvicorn app.main:app --reload --host 0.0.0.0 --port 8080
pytest tests/ # Run tests
alembic upgrade head # Apply migrations (production only)IMPORTANT: Always use the virtual environment at backend/venv for all Python operations outside of Docker containers.
# Activate the virtual environment
source backend/venv/bin/activate
# Install dependencies (if needed)
pip install -r backend/requirements.txt
# Run pre-commit hooks
pre-commit run --all-files
# Run linting tools
ruff check backend/
ruff format backend/
# Run type checking
mypy backend/app
# Run tests
pytest backend/tests/
# Deactivate when done
deactivateWhen to use the venv:
- Running pre-commit hooks locally
- Running mypy, ruff, bandit, or other linting tools
- Installing Python packages for development
- Running pytest outside of Docker
- Any Python CLI tools (black, isort, etc.)
Note: The virtual environment should already be set up. If not, create it with:
cd backend
python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install pre-commit mypy ruff banditOpenTranscribe uses Alembic migrations for all database schema changes. Migrations run automatically on backend startup, ensuring existing user data is preserved during upgrades.
When you need to add columns, indexes, or constraints:
-
Create an Alembic migration in
backend/alembic/versions/:# Example: backend/alembic/versions/v070_add_pki_security_enhancements.py revision = "v070_add_pki_security_enhancements" down_revision = "v060_add_transcript_overlap" # Previous migration def upgrade(): # Use idempotent SQL (IF NOT EXISTS) to be safe op.execute(""" DO $$ BEGIN IF NOT EXISTS ( SELECT 1 FROM information_schema.columns WHERE table_name = 'user' AND column_name = 'new_column' ) THEN ALTER TABLE "user" ADD COLUMN new_column VARCHAR(256); END IF; END $$; """) def downgrade(): op.execute('ALTER TABLE "user" DROP COLUMN IF EXISTS new_column')
-
Update SQLAlchemy models in
backend/app/models/to match -
Update Pydantic schemas in
backend/app/schemas/if needed -
Update migration detection in
backend/app/db/migrations.py:- Add detection logic for the new schema version
- Update the latest version check
| File | Purpose |
|---|---|
backend/alembic/versions/*.py |
Alembic migrations (versioned, the schema authority) |
backend/app/db/migrations.py |
Auto-detection and startup runner |
database/init_db.sql |
Legacy reference only (no longer used for schema) |
On backend startup, migrations.py automatically:
- Detects current database schema version
- Stamps untracked databases with the appropriate version
- Runs any pending Alembic migrations to bring DB to current version
- Testing schema changes: Use
./opentr.sh reset dev(deletes data, runs full migration chain) - Testing migrations: Rebuild backend and restart (migrations run on startup)
- Production upgrades: Just restart the backend - migrations apply automatically
app/api/endpoints/- REST API routes organized by resourceapp/models/- SQLAlchemy ORM modelsapp/schemas/- Pydantic validation schemasapp/services/- Business logic and external integrationsapp/tasks/- Celery background tasksapp/core/- Configuration and security
src/components/- Reusable Svelte componentssrc/components/settings/- Settings-related componentssrc/routes/- Page componentssrc/stores/- Svelte stores for state managementsrc/lib/- Utilities and servicessrc/lib/i18n/- Internationalization system (7 languages)src/lib/i18n/locales/- Translation JSON files (en, es, fr, de, pt, zh, ja)
- Authentication: Hybrid multi-method with JWT tokens and refresh token rotation
- File Processing: Upload to MinIO → Celery task → AI processing → Database storage
- Real-time Updates: WebSockets for task progress notifications
- Error Handling: Structured error responses with proper HTTP status codes
OpenTranscribe supports multiple authentication methods that can be enabled simultaneously (hybrid authentication).
Available Authentication Methods:
- Local (Direct): Username/password with bcrypt hashing
- LDAP/Active Directory: Enterprise directory integration
- OIDC/Keycloak: OpenID Connect with external identity providers
- PKI/X.509: Certificate-based authentication for high-security environments
Authentication Module Structure:
backend/app/auth/
├── direct_auth.py # Local password authentication
├── ldap_auth.py # LDAP/Active Directory integration
├── keycloak_auth.py # OIDC/Keycloak integration
├── pki_auth.py # PKI/X.509 certificate authentication
├── mfa.py # TOTP multi-factor authentication
├── password_policy.py # Password strength enforcement
├── password_history.py # Password reuse prevention
├── rate_limit.py # Authentication rate limiting
├── lockout.py # Account lockout management
├── session.py # Session/token management
├── token_service.py # JWT token operations
├── audit.py # Authentication audit logging
└── constants.py # Auth-related constants
Authentication Database Models:
UserMFA- MFA/TOTP settings per user (backend/app/models/user_mfa.py)PasswordHistory- Password history tracking (backend/app/models/password_history.py)RefreshToken- Refresh token management (backend/app/models/refresh_token.py)
Key Authentication Patterns:
- Hybrid Auth: Multiple methods can be enabled simultaneously via
AUTH_TYPEconfig - Token Flow: JWT access tokens (short-lived) + refresh token rotation (long-lived)
- External IdP Users: PKI/Keycloak users bypass local MFA (handled by IdP)
- Rate Limiting: Per-IP and per-user rate limits to prevent brute force
- Account Lockout: Configurable lockout after failed attempts
- Audit Logging: All auth events logged for security compliance
Configuration:
- Set
AUTH_TYPEin.env(local, ldap, keycloak, pki, or comma-separated for hybrid) - Provider-specific settings documented in
docs/KEYCLOAK_SETUP.mdanddocs/PKI_SETUP.md - MFA can be enforced globally or per-user
This is a professional production application. All code must meet production-quality standards:
- No cheating or workarounds: Every feature must work correctly end-to-end. No skipping validation, no placeholder implementations, no "good enough" shortcuts
- No mocking in production code: Mocks are only acceptable in test fixtures. Production code paths must use real implementations
- Industry standards compliance: All security features (MFA, authentication, encryption) must follow established RFCs and industry standards (e.g., RFC 6238 for TOTP, RFC 4226 for HOTP)
- Real integration testing: If a test requires external services (Redis, PostgreSQL, OpenSearch), run the test against real services or clearly document the dependency. Do not silently skip critical test paths
- Authenticator app compatibility: MFA/TOTP must be compatible with standard authenticator apps (Google Authenticator, Microsoft Authenticator, Authy, etc.)
- Keep files under 200-300 lines
- Use Google-style docstrings for Python code
- Follow existing patterns before creating new ones
- Always check for TypeScript errors
- Ensure light/dark mode compliance for frontend changes
A pre-commit hook automatically runs svelte-check and vite build when frontend source files are modified.
Behavior:
- Only triggers when
.svelte,.ts,.js,.css, or.htmlfiles underfrontend/src/are staged - Runs
svelte-check --threshold warning(fails on both errors and warnings) - Runs
vite buildto verify the production build succeeds - If Claude CLI is available, attempts automatic fixes on failure
- Falls back to showing error output if Claude is unavailable or cannot fix
Manual usage:
# Run the frontend check script directly
./scripts/frontend-check.sh
# Run without Claude auto-fix
./scripts/frontend-check.sh --no-claude
# Run only svelte-check (skip build, faster)
./scripts/frontend-check.sh --check-only
# Use Claude Code skill to fix frontend errors (inside a Claude Code session)
/fix-frontend- Use
docker compose(notdocker-compose) - Docker Compose Structure (base + override pattern):
docker-compose.yml- Base configuration (all environments)docker-compose.override.yml- Development overrides (auto-loaded)docker-compose.prod.yml- Production overridesdocker-compose.offline.yml- Offline/airgapped overrides
- Development: Just run
docker compose up(auto-loads override) - Production: Use
-f docker-compose.yml -f docker-compose.prod.yml - Offline: Use
-f docker-compose.yml -f docker-compose.offline.yml - Always check container logs after starting services
- Kill existing servers before testing changes
- Layer Docker files for optimal caching
- Write thorough tests for major functionality
- No mocking data for dev/prod (tests only)
- Always restart/reset services after making changes
- Use appropriate opentr.sh commands for testing changes
- Frontend: http://localhost:5173
- Backend API: http://localhost:5174/api
- API Docs: http://localhost:5174/docs
- MinIO Console: http://localhost:5179
- Flower Dashboard: http://localhost:5175/flower
- OpenSearch: http://localhost:5180 (v3.4.0 with Lucene 10)
- Environment config:
.env(never overwrite without confirmation) - Environment template:
.env.example(freely editable - keep in sync when adding new env vars) - Database migrations:
backend/alembic/versions/(schema authority) - Docker base config:
docker-compose.yml(common to all environments) - Docker dev config:
docker-compose.override.yml(auto-loaded in dev) - Docker prod config:
docker-compose.prod.yml(production overrides) - Docker offline config:
docker-compose.offline.yml(airgapped overrides) - Frontend build:
frontend/vite.config.ts
- File upload to MinIO storage
- Metadata extraction and database record creation
- Celery task dispatch to worker with GPU support
- WhisperX transcription with native word-level timestamps (100+ languages supported)
- Configurable source language (auto-detect or specify)
- Optional translation to English
- All 100+ languages support word-level timestamps natively via faster-whisper cross-attention DTW
- PyAnnote speaker diarization and voice fingerprinting
- LLM speaker identification suggestions (optional - manual verification required)
- LLM-powered summarization with BLUF format (optional - user-triggered)
- Automatic context-aware processing (single or multi-section based on transcript length)
- Intelligent chunking at speaker/topic boundaries for long content
- Section-by-section analysis with final summary stitching
- Configurable output language (12 languages supported)
- Database storage and OpenSearch indexing
- WebSocket notification to frontend
OpenTranscribe defaults to large-v3-turbo for optimal speed. Users can override via WHISPER_MODEL environment variable.
Model Comparison:
| Model | Speed | VRAM | English | Multilingual | Translation | Best For |
|---|---|---|---|---|---|---|
large-v3-turbo |
6x faster | ~6GB | Excellent | Good* | NO | English, speed-critical |
large-v3 |
Slow | ~10GB | Excellent | Best | Yes | Non-English, translation |
large-v2 |
Slow | ~10GB | Excellent | Good | Yes | Legacy, translation |
*Turbo shows slightly reduced accuracy for Thai and Cantonese compared to large-v2
Language-Specific Recommendations:
| Use Case | Recommended Model | Why |
|---|---|---|
| English podcasts/meetings | large-v3-turbo |
6x faster, excellent English accuracy |
| Spanish, French, German, Japanese | large-v3-turbo |
Good accuracy, much faster |
| Thai, Cantonese, Vietnamese | large-v3 |
Turbo has reduced accuracy for these |
| Low-resource languages | large-v3 |
10-20% better than other models |
| Translation to English | large-v3 |
Turbo cannot translate |
| Maximum accuracy (any language) | large-v3 |
Best overall accuracy |
Critical Translation Warning: large-v3-turbo was NOT trained for translation tasks. If users enable "Translate to English" in settings, they should use large-v3 or large-v2.
Configuration:
# In .env file
WHISPER_MODEL=large-v3-turbo # Default - 6x faster
WHISPER_MODEL=large-v3 # For translation or maximum accuracy
WHISPER_MODEL=large-v2 # Legacy fallbackThe application now includes optional AI-powered features using Large Language Models:
AI Summarization:
- BLUF (Bottom Line Up Front) format summaries
- Speaker analysis with talk time and key contributions
- Action items extraction with priorities and assignments
- Key decisions and follow-up items identification
- Support for multiple LLM providers (vLLM, OpenAI, Ollama, Claude)
- Intelligent section-by-section processing for transcripts of any length
- Automatic context-aware chunking and summary stitching
- Multilingual output: Generate summaries in 12 languages (en, es, fr, de, pt, zh, ja, ko, it, ru, ar, hi)
Speaker Identification:
- LLM-powered speaker name suggestions based on conversation context
- Confidence scoring for identification accuracy
- Manual verification workflow (suggestions are not auto-applied)
- Cross-video speaker matching with embedding analysis
- Speaker merge UI for combining duplicate speakers
Model Discovery:
- Automatic discovery of available models for vLLM, Ollama, and Anthropic
- Works with OpenAI-compatible API endpoints
- Edit mode supports stored API keys (no need to re-enter)
Configuration:
- Set
LLM_PROVIDERin .env file (vllm, openai, ollama, anthropic, openrouter) - Configure provider-specific settings (API keys, endpoints, models)
- Features work independently - transcription works without LLM configuration
Deployment Options:
- Cloud Providers: Use
.envconfiguration with external providers (OpenAI, Claude, OpenRouter, etc.) - Self-Hosted LLM: Configure vLLM or Ollama endpoints in
.env(deployed separately) - No LLM: Leave LLM_PROVIDER empty for transcription-only mode
OpenTranscribe now supports transcription in 100+ languages with configurable settings:
User-Configurable Language Settings:
- Source Language: Auto-detect or specify language for improved accuracy
- Translate to English: Toggle to translate non-English audio to English
- LLM Output Language: Generate AI summaries in preferred language (12 supported)
Language Features:
- All 100+ languages support word-level timestamps natively via faster-whisper cross-attention DTW
- Settings stored per-user in the database
Configuration:
- User settings available in Settings → Transcription → Language Settings
- Per-file language override available at upload/reprocess time
Users can configure their own transcription preferences:
Transcription Settings:
- Speaker Behavior: Always prompt, use defaults, or use saved custom values
- Min/Max Speakers: Configure default speaker detection range (1-50+)
- Garbage Cleanup: Enable/disable automatic cleanup of erroneous segments
Recording Settings:
- Audio recording quality and duration preferences
- Microphone device selection
OpenTranscribe supports downloading and processing videos from 1800+ platforms via yt-dlp integration:
Supported Platforms:
- Best Support (Recommended): YouTube, Dailymotion, TikTok - most reliable for public video downloads
- Limited Support: Vimeo, Twitter/X, Instagram, Facebook - may require authentication for many videos
- Other Platforms: Twitch, Reddit, SoundCloud, and 1800+ more sites supported by yt-dlp
Platform Limitations:
- Some platforms (Vimeo, Instagram, Facebook) restrict most videos to logged-in users
- Authentication-required videos cannot be downloaded without browser cookies
- Age-restricted, geo-restricted, or private videos are not accessible
- Premium/subscriber-only content requires active subscriptions
User-Friendly Error Messages:
- The system detects authentication-related errors and provides helpful guidance
- When a video fails, users receive platform-specific suggestions
- Recommends alternative platforms when authentication issues are detected
How It Works:
- User enters a video URL from any supported platform
- System validates the URL and extracts video metadata via yt-dlp
- Video is downloaded in web-compatible format (H.264/MP4 preferred)
- Downloaded video is uploaded to MinIO storage
- Standard transcription pipeline processes the video
- WebSocket notification updates the frontend
Configuration:
- No additional configuration required - yt-dlp is included in the backend container
- Anti-blocking measures included for YouTube (client rotation, proper headers)
- Maximum video duration: 4 hours
- Maximum file size: 15GB (same as direct upload limit)
OpenTranscribe uses a simple volume-based model caching system that automatically persists AI models between container restarts.
- Set
MODEL_CACHE_DIRin.envto specify cache location (default:./models) - Models are automatically downloaded on first use
- All models persist across container restarts and rebuilds
${MODEL_CACHE_DIR}/
├── huggingface/ # HuggingFace models cache
│ ├── hub/ # WhisperX models (~1.5GB)
│ └── transformers/ # PyAnnote transformer cache
├── torch/ # PyTorch models cache
│ └── pyannote/ # PyAnnote speaker models (~500MB)
├── nltk_data/ # NLTK data files
│ ├── tokenizers/ # punkt_tab tokenizer (~13MB)
│ └── taggers/ # POS taggers
├── sentence-transformers/ # Sentence transformers models
│ └── sentence-transformers_all-MiniLM-L6-v2/ # Semantic search model (~80MB)
└── opensearch-ml/ # OpenSearch neural search models
├── all-MiniLM-L6-v2/ # Default model for neural search (~80MB)
│ └── sentence-transformers_all-MiniLM-L6-v2-1.0.1-torch_script.zip
└── model_manifest.json # Download metadata
MIN_SPEAKERS / MAX_SPEAKERS Parameters:
PyAnnote's speaker diarization uses sklearn's AgglomerativeClustering, which has NO hard maximum limit on the number of speakers:
- Default:
MIN_SPEAKERS=1,MAX_SPEAKERS=20 - Can be increased to 50+ for large conferences/events with many speakers
- No hard upper limit - only constrained by the number of audio samples
- Performance threshold at
max(100, 0.02 * n_samples)where algorithm behavior changes for efficiency
Use Cases:
- Small meetings: 2-5 speakers (default works fine)
- Medium meetings: 5-15 speakers (default works fine)
- Large conferences: 15-50 speakers (increase MAX_SPEAKERS to 30-50)
- Very large events: 50+ speakers (increase MAX_SPEAKERS accordingly)
Note: Higher values may impact processing time but will not cause errors.
The system uses simple volume mappings to cache models to their natural locations:
volumes:
- ${MODEL_CACHE_DIR}/huggingface:/home/appuser/.cache/huggingface
- ${MODEL_CACHE_DIR}/torch:/home/appuser/.cache/torch
- ${MODEL_CACHE_DIR}/nltk_data:/home/appuser/.cache/nltk_data
- ${MODEL_CACHE_DIR}/sentence-transformers:/home/appuser/.cache/sentence-transformers
- ${MODEL_CACHE_DIR}/opensearch-ml:/home/appuser/.cache/opensearch-ml # OpenSearch neural models
# OpenSearch container volume mapping (read-only)
- ${MODEL_CACHE_DIR}/opensearch-ml:/ml-models:ro # Mounted in OpenSearch container- No code complexity: Models use their natural cache locations
- Persistent storage: Models saved between container restarts
- User configurable: Simple
.envvariable controls cache location - No re-downloads: Models cached after first download (~2.5GB total)
- Automatic setup: OpenSearch neural models download automatically on first start
- Offline capability: Models persist for air-gapped deployments
OpenTranscribe uses OpenSearch's native neural search for semantic/vector search capabilities:
Model Management:
- Default Model:
all-MiniLM-L6-v2(384-dim, 80MB, English) - Auto-Download: Default model downloads automatically on first start if missing
- Offline Support: Models persist in
${MODEL_CACHE_DIR}/opensearch-ml/for air-gapped deployments - Fallback: If model download fails, OpenSearch downloads from internet on-demand (temporary, not persisted)
Download Methods:
- Automatic (Recommended): Run
./opentr.sh start dev- models download if missing - Manual Pre-Download: Run
bash scripts/download-models.sh modelsbefore starting - Offline Preparation:
# On internet-connected machine: DOWNLOAD_ALL_OPENSEARCH_MODELS=true bash scripts/download-models.sh models # Copy models/ directory to offline machine rsync -av models/ user@offline-machine:/opt/opentranscribe/models/
Available Models (see backend/app/core/constants.py:OPENSEARCH_EMBEDDING_MODELS):
- Fast tier (384d):
all-MiniLM-L6-v2(default),paraphrase-multilingual-MiniLM-L12-v2 - Balanced tier (768d):
all-mpnet-base-v2,paraphrase-multilingual-mpnet-base-v2 - Best tier:
all-distilroberta-v1(768d),distiluse-base-multilingual-cased-v1(512d)
Backend Startup Flow:
- Backend checks for local models in
/ml-models/(OpenSearch container mount) - If default model missing, downloads to
~/.cache/opensearch-ml/(backend container) - Volume mapping persists models to host
${MODEL_CACHE_DIR}/opensearch-ml/ - OpenSearch ML Commons registers model from local file (
file://) for offline use - If local registration fails, falls back to remote HuggingFace registration (requires internet)
Configuration (.env):
OPENSEARCH_NEURAL_SEARCH_ENABLED=true # Enable/disable neural search (default: true)
OPENSEARCH_NEURAL_MODEL=huggingface/sentence-transformers/all-MiniLM-L6-v2 # Model to useOpenTranscribe backend containers run as a non-root user (appuser, UID 1000) following Docker security best practices.
Benefits:
- Follows principle of least privilege
- Reduces security risk from container escape vulnerabilities
- Compliant with security scanning tools (Trivy, Snyk, etc.)
- Prevents host root compromise in case of container breach
Automatic Permission Management:
The startup scripts (./opentr.sh and ./opentranscribe.sh) automatically check and fix model cache permissions before starting containers. This ensures the non-root container user (UID 1000) can access the model cache without permission errors.
If you encounter permission issues, you can manually fix them:
# Fix permissions on existing model cache
./scripts/fix-model-permissions.shThis script will change ownership of your model cache to UID:GID 1000:1000, making it accessible to the non-root container user.
Technical Details:
- Container user:
appuser(UID 1000, GID 1000) - User groups:
appuser,video(for GPU access) - Cache directories:
/home/appuser/.cache/huggingface,/home/appuser/.cache/torch - Multi-stage build for minimal attack surface
- Health checks for container orchestration
- Create endpoint in
backend/app/api/endpoints/ - Add to router in
backend/app/api/router.py - Create/update schemas in
backend/app/schemas/ - Update database models if needed
- Test with
./opentr.sh restart-backend
- Create component in
src/components/ - Ensure light/dark mode support
- Test responsive design
- Update relevant routes/stores if needed
- Test with
./opentr.sh restart-frontend
- Create an Alembic migration in
backend/alembic/versions/ - Update SQLAlchemy models in
backend/app/models/ - Update Pydantic schemas in
backend/app/schemas/if needed - Update
backend/app/db/migrations.pydetection logic for the new version - Test with
./opentr.sh reset dev(runs full migration chain from scratch)