Releases: attevon-llc/OpenTranscribe
v0.4.1 - LDAP DN Fix & Keycloak PKI Compliance
LDAP DN Fix & Keycloak PKI Compliance
This patch release fixes a critical LDAP group filtering bug reported in #188 and adds government/FedRAMP Keycloak-as-PKI-broker support.
What Was Broken
Active Directory Distinguished Names use commas as internal syntax (e.g. CN=Whisper_Users,CN=Users,DC=domain,DC=local). The previous code split group lists on commas, which shredded full DNs into fragments that could never match what AD returned. Group filtering was silently broken for any installation using full DNs.
Highlights
- LDAP group DN parsing fixed — group lists now use semicolons as the multi-group separator; full AD DNs work correctly
- PKI_ADMIN_DNS parsing fixed — same semicolon delimiter fix for certificate admin lists
- Keycloak X.509 PKI broker — cert claims injected by Keycloak (both
cert_*andx509_cert_*forms) are extracted and stored on the user record - PKI admin promotion via Keycloak — cert DN in
PKI_ADMIN_DNSgrants admin access for Keycloak users, matching standalone PKI auth behaviour - Government cert CN format —
CN=LastName FirstName emailusername(space-separated 3-token) parsed and displayed asFirst Last - 116 new unit tests across
ldap_authandkeycloak_authmodules
Upgrade Notes
LDAP group list format change — update your LDAP_REQUIRED_USER_GROUPS and LDAP_ADMIN_GROUPS environment variables to use semicolons:
# Before (broken for full DNs)
LDAP_REQUIRED_USER_GROUPS=CN=Whisper_Users,CN=Users,DC=domain,DC=local
# After (correct)
LDAP_REQUIRED_USER_GROUPS=CN=Whisper_Users,CN=Users,DC=domain,DC=local
# Multiple groups — use semicolons
LDAP_REQUIRED_USER_GROUPS=CN=Group1,DC=domain,DC=local;CN=Group2,DC=domain,DC=localPKI_ADMIN_DNS — if you have multiple admin DNs, use semicolons:
PKI_ADMIN_DNS=CN=Doe John jdoe,OU=Agency,O=U.S. Government,C=US;CN=Smith Jane jsmith,OU=Agency,O=U.S. Government,C=USNo database migrations required.
How to Update
Docker Compose:
docker compose pull
docker compose up -dFull Changelog
See CHANGELOG.md
v0.4.0 — Enterprise Auth, Native Pipeline, Neural Search & Security Hardening
v0.4.0 — Enterprise Auth, Native Pipeline, Neural Search & Security Hardening
A major release combining enterprise-grade authentication, a native transcription pipeline, neural search, GPU optimizations, cloud ASR providers, comprehensive speaker intelligence, a Progressive Web App, user groups & sharing, and a final frontend hardening sprint — all built from processing 1,400+ real-world recordings over two months of development. 281 commits since v0.3.3.
🔐 Enterprise Authentication
Four authentication methods that can run simultaneously, configured through the admin UI without restarts:
- Local — Username/password with bcrypt, TOTP MFA (RFC 6238 — Google Authenticator, Authy, Microsoft Authenticator), FedRAMP IA-5 password policies (complexity, history, expiration), NIST AC-7 account lockout with progressive thresholds
- LDAP/Active Directory — Enterprise directory integration with auto-provisioning and username-attribute mapping
- OIDC/Keycloak — OpenID Connect with federated identity, social login, and federated logout propagation
- PKI/X.509 — Certificate-based mTLS authentication with OCSP/CRL revocation checking and super-admin local password fallback
Plus: per-IP and per-user rate limiting, audit logging in structured JSON/CEF format with OpenSearch integration, JWT refresh token rotation with concurrent session limits, and database-driven configuration with AES-256-GCM encryption at rest — all manageable from a Super Admin UI without restarts.
⚡ Native Transcription Pipeline (2× Faster)
Replaced the legacy WhisperX pipeline with a native engine built on faster-whisper's BatchedInferencePipeline + PyAnnote v4. Cross-attention DTW provides word timestamps during transcription — no separate alignment pass, no wav2vec2 dependency, and native word timestamps for all 100+ languages (previously only ~42 via wav2vec2).
Benchmark (3.3-hour podcast, RTX A6000): 706s → 332s — 2.1× faster
- Unified pipeline replaces the previous
parallel_pipeline/whisperx_servicesplit - User-configurable VAD — Voice Activity Detection threshold and silence duration exposed as tunable settings
- Word timestamp validation — post-processing ensures monotonicity and prevents drift
- GPU pipeline benchmarks — 40.3× single-file realtime, 54.6× peak at concurrency=8, perfect linear scaling 1–12 workers
- TF32 acceleration enabled at worker startup and after diarization (Ampere+ GPUs)
🎙️ PyAnnote v4 Migration & Speaker Intelligence
- Automatic migration system — Admin UI with real-time progress bar migrates speaker embeddings from v3 (512-dim) to v4 (256-dim) via atomic alias swap, zero downtime
- Speaker overlap detection — Identifies overlapping speakers with confidence scoring
- Speaker pre-clustering — GPU-accelerated cross-file speaker grouping (#144)
- Global Speaker Management page — Dedicated page for cross-file speaker profile management with avatars
- Gender classification — Apache 2.0 licensed neural network predicts gender from voice; stored on profiles for cross-video consistency
- Gender-informed cluster validation — Cross-gender cluster assignments require higher similarity thresholds; minority members flagged for review
- Speaker metadata parsing — Cross-reference pipeline with metadata hints display for LLM-assisted speaker identification (#141)
- Jump-to-timestamp links in the speaker editor (#147)
- Unassign & blacklist — Remove speaker assignments and blacklist erroneous profiles
- Outlier analysis — Detect and flag outlier embeddings in speaker clusters
- Inline audio playback — Play/pause toggle in speaker cluster views
- OpenSearch cosine score fix — All 8 kNN score read locations now correctly convert
(1+cos)/2→ raw cosine - Warm model caching eliminates 40-60s cold-start delays by pre-loading models on startup
🔍 Hybrid Neural Search
Full-text BM25 combined with semantic vector search via OpenSearch ML Commons. Search for "budget discussion" and find segments about "financial planning" even when those exact words never appear.
- ML Commons integration — Native OpenSearch neural search, server-side embeddings
- RRF hybrid merging — BM25 + vector scores combined via Reciprocal Rank Fusion
- 6 embedding model tiers — from 384-dim MiniLM (fast) to 768-dim mpnet (best quality)
- Hybrid search crash fix — Previously silent fallback to BM25-only on OpenSearch 3.4 due to
ArrayIndexOutOfBoundsExceptionwhen combiningaggs+hybrid+collapse+ RRF - Soft demotion instead of hard suppression — Semantic results no longer dropped
- Dynamic over-fetch — Cap raised from 200 to 1,000 via
SEARCH_MAX_OVERFETCHfor large indexes - BM25 tuning — Fuzziness AUTO, cross-fields, phrase slop, rank constant tuned 40→30
- Stop/cancel reindex — Admin UI can cancel in-flight reindex operations
- Offline/airgapped model downloading for air-gapped deployments
- Dynamic model management via admin UI
☁️ Cloud ASR Providers
For deployments without a GPU — 8 cloud speech providers plus cloud diarization (#150):
- Providers: Deepgram, AssemblyAI, OpenAI Whisper API, Google, AWS Transcribe, Azure Speech, Speechmatics, Gladia
- pyannote.ai cloud diarization integration
- Independent diarization provider architecture —
diarization_sourceselector: ASR built-in, local PyAnnote GPU, pyannote.ai cloud, or off — independent of transcription provider choice - API-lite deployment mode — 2 GB CPU-only image vs. 8.9 GB for the full GPU image. Cloud-transcribed files still get local speaker embedding extraction for cross-file matching
- Custom vocabulary — Domain-specific hotwords (medical, legal, corporate, government) used as faster-whisper hotwords and cloud provider keyword boosting
- Admin-pinned ASR model — Admins control local Whisper model selection; model loaded once at startup, shared across all workers
- Per-transcription model override — Users can override the admin-pinned model per upload (#153)
🤝 User Groups, Collection Sharing & Collaboration
- User Groups & Collection Sharing (#148) — Create user groups and share collections with groups or individual users; granular viewer/editor permissions
- Speaker profile sharing via the collection sharing infrastructure
- Config/prompt sharing — Share LLM configs, prompts, media sources, and organization contexts between users
- Per-collection AI prompts (#146) — Different summarization styles for different collection types
- Bidirectional prompt-collection links — Prompts show which collections use them
- Organization context (#142) — Inject domain knowledge into all LLM prompts for context-aware summaries
📤 Upload & Media
- TUS 1.0.0 resumable uploads (#10) — Chunked uploads with MinIO multipart storage that survive network interruptions
- Collection & tag selection at upload (#145) — Organize files during upload, not after
- URL download quality settings (#122) — Configure video resolution, audio-only mode, and bitrate for yt-dlp downloads
- File retention & auto-deletion (#134) — Admin-configurable file retention with automatic deletion and GDPR-compliant audit logging
- Auto-labeling (#140) — AI suggests tags and collections from transcript content with fuzzy deduplication
- Disable AI summary per upload (#152)
- Disable speaker diarization per upload (#151)
- Selective reprocessing (#143) — Stepper UI to re-run specific pipeline stages on existing files
- YouTube bot-bypass — 2026 yt-dlp best practices (Deno JS runtime, client rotation, proper headers) for 1,800+ supported platforms
🛡️ Frontend Hardening Sprint
A dedicated audit sprint shipped in this release. Full details below under "Security", but the highlights:
- Flash of Authenticated Content (FOAC) fix — Layout now gates protected content during async auth verification
- Centralized user state cleanup (
lib/session/clearUserState.ts) — 17+ stores, caches, and localStorage keys cleared on every login/logout - Session-scoped
AbortControllercancels in-flight requests on logout - bfcache invalidation — Back button after logout forces reload to discard restored snapshots
- DOMPurify sanitization across 8
{@html}render sites; replaces a bypassable regex sanitizer - Production source maps disabled
- Keycloak redirect URL validation
🎨 UX & Frontend Polish
- Upload modal redesign — Replaced the 4,603-line monolith with a 6-step linear stepper (Media → Tags → Collections → Speakers → Options → Submit) plus a conditional Extract step for large videos. All three upload sources (file/URL/recording) now share steps 2-6. "Remember previous values" and "Review with defaults" shortcuts for power users
- Skeleton loaders — Replace generic spinners on home gallery, search results, file detail, and speaker clusters/profiles/inbox (~20% faster perceived load per Nielsen Norman research)
- Gallery click feedback — Instant press state + mousedown prefetch (~50-100ms head start)
- Gallery redesign — Compact Apple-like grid cards, list view, sorting, multi-select bulk actions
- Gallery state persistence — Filters persist across file detail navigation; scroll position restored on back
- Collection & Share modal polish — Intro text, permission reference cards, empty states, backdrop-click data-loss protection
- Manage Collections visual fix — Eliminated the "card in a card" glitch
- Settings redesign — Tabbed navigation, per-user preferences, speaker behavior defaults
- Queue Dashboard — Unified tasks view (formerly File Status) with quick filters, DatePicker, and pagination
- Stepper reprocess UI — Step-by-step reprocessing with stage picker
- Gallery action consolidation — Action buttons moved to header with dropdown groups (#139)
- **Multi-select with auto-filter a...
v0.3.3 - Community Contributions & Protected Media Support
Community Contributions & Protected Media Support
Community-driven release featuring contributions from @vfilon, who submitted all four PRs in this version!
Highlights
- 🇷🇺 Russian Language Support - 8th supported UI language with 1,600+ translated strings
- 🔐 Protected Media Authentication - New plugin system for downloading from password-protected corporate video portals (MediaCMS support built-in)
- 🛠️ Bug Fixes - VRAM monitoring fix for non-CUDA devices, loading screen translation fix
- 🔧 URL Utilities - Centralized URL construction for consistent dev/production behavior
How to Update
Docker Compose:
docker compose pull
docker compose up -dProtected Media Setup (Optional)
To enable authenticated downloads from MediaCMS installations:
# Add to .env
MEDIACMS_ALLOWED_HOSTS=media.example.com,mediacms.internalFull Changelog
See CHANGELOG.md for complete details.
Thank You
Special thanks to @vfilon for contributing all four PRs in this release!
v0.3.2 - Setup Script Bug Fixes
Patch release fixing critical bugs in the one-liner installation script that prevented successful setup on fresh installations.
Note: This is a scripts-only release. No Docker container rebuild required.
Fixed
Setup Script Fixes
- Scripts Directory Creation - Fixed curl error 23 ("Failure writing output to destination") when downloading SSL and permission scripts by creating the
scripts/directory before download attempts - PyTorch 2.6+ Compatibility - Applied
torch.loadpatch todownload-models.pyfor PyTorch 2.6+ compatibility, mirroring the fix already present in the backend (from Wes Brown's commit 8929cd6)- PyTorch 2.6 changed
weights_onlydefault toTrue, causing omegaconf deserialization errors during model downloads - The patch sets
weights_only=Falsefor trusted HuggingFace models
- PyTorch 2.6 changed
Upgrade Notes
For existing installations: No action required - Docker containers already have the PyTorch fix.
For new installations: The one-liner setup script now works correctly:
curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bashFull Changelog: https://github.com/davidamacey/OpenTranscribe/blob/master/CHANGELOG.md
v0.3.1 - Script Enhancements & Documentation Updates
Script Enhancements & Documentation Updates
Patch release with enhanced setup scripts for HTTPS/SSL configuration and comprehensive documentation updates covering v0.2.0 and v0.3.0 features.
Highlights
New Management Commands
./opentranscribe.sh setup-ssl- Interactive HTTPS/SSL configuration./opentranscribe.sh version- Check current version and available updates./opentranscribe.sh update- Update containers only (quick)./opentranscribe.sh update-full- Update containers + config files (recommended)
NGINX Improvements
- Automatic NGINX overlay loading when
NGINX_SERVER_NAMEis configured - NGINX health check added to
./opentr.sh health
Documentation Updates
- New comprehensive NGINX/SSL setup guide
- Updated docs for Universal Media URL support (1800+ platforms)
- Added garbage cleanup feature documentation
- FAQ entries for system statistics and transcript pagination
- All Docusaurus and README docs updated for v0.2.0/v0.3.0 features
How to Update
Existing installations:
./opentranscribe.sh update-fullNew installations:
curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bashFull Changelog
See CHANGELOG.md
OpenTranscribe v0.3.0 - Universal Media URL & NGINX Support
What's New in v0.3.0
This release integrates valuable contributions from the community fork by @vfilon, bringing major new features including support for 1800+ video platforms and production-ready NGINX reverse proxy with SSL/TLS.
🎬 Universal Media URL Support (1800+ Platforms)
The headline feature expands OpenTranscribe far beyond YouTube:
Supported Platforms:
- Primary (Best Support): YouTube, Dailymotion, Twitter/X
- Secondary: Vimeo (public only), TikTok (variable), and 1800+ more via yt-dlp
Features:
- Dynamic source platform detection from yt-dlp metadata
- User-friendly error messages for authentication-required platforms
- Platform guidance for common issues (Vimeo login, Instagram restrictions, etc.)
- Updated UI with "Supported Platforms" section and limitations warning
Note: Authentication is not currently supported. Videos requiring login will fail with helpful error messages guiding users to publicly accessible alternatives.
🔐 NGINX Reverse Proxy with SSL/TLS
This release closes #72, enabling browser microphone recording on remote network access:
docker-compose.nginx.ymloverlay for production deployments- Full SSL/TLS configuration with HTTP → HTTPS redirect
- WebSocket proxy support for real-time updates
- 2GB file upload support for large media files
- Flower dashboard and MinIO console accessible through NGINX
- Self-signed certificate generation script
🔧 Critical Bug Fixes: UUID/ID Standardization
Comprehensive fix for UUID/ID handling across 60+ files:
Issues Fixed:
- Speaker recommendations not showing for new videos
- Profile embedding service returning wrong ID type
- Inconsistent ID handling between backend and frontend
- Comment system UUID issues
- Password reset flow problems
🏗️ Infrastructure Improvements
GPU Configuration:
- Separated into optional
docker-compose.gpu.ymloverlay - Better cross-platform support (macOS, CPU-only systems)
- Auto-detection in
opentr.shscript
Task Management:
- Task status reconciliation before marking files as stuck
- Multiple timestamp fallbacks for better reliability
- Auto-refresh analytics when segment speaker changes
LLM Service:
- Ollama context window configuration (
num_ctxparameter) - Model-aware temperature handling
- Better logging with resolved endpoint info
🌐 i18n Updates
All 7 supported languages have been updated:
- Notification text changed from "YouTube Processing" to "Video Processing"
- New media URL description and platform limitation strings
- Updated recommended platforms list
🙏 Acknowledgments
Special thanks to @vfilon for the fork contributions that made this release possible:
- Universal Media URL support concept
- NGINX reverse proxy configuration
- Task status reconciliation improvements
- GPU overlay separation
How to Update
Docker Compose (Recommended)
# Pull the latest images
docker compose pull
# Restart with new images
docker compose up -dFor NGINX/SSL Setup
# Set NGINX_SERVER_NAME in .env
./scripts/generate-ssl-cert.sh
./opentr.sh start prodSee docs/NGINX_SETUP.md for complete setup instructions.
Full Changelog
See the CHANGELOG for complete details.
Full Changelog: v0.2.1...v0.3.0
v0.2.1 - Security Patch Release
Security Patch Release
This release addresses critical container vulnerabilities identified in security scans. All users are encouraged to update.
Resolved Critical CVEs (4 → 0)
| CVE | Package | Severity | Status |
|---|---|---|---|
| CVE-2025-47917 | libmbedcrypto | CRITICAL | ✅ Fixed |
| CVE-2023-6879 | libaom3 | CRITICAL | ✅ Fixed |
| CVE-2025-7458 | libsqlite3 | CRITICAL | ✅ Fixed |
| CVE-2023-45853 | zlib | CRITICAL | ✅ Fixed |
Container Updates
Frontend:
nginx:1.29.3-alpine3.22→nginx:1.29.4-alpine3.23- Fixed 6 vulnerabilities (3 HIGH, 3 MEDIUM) in libpng and busybox
- Added HEALTHCHECK instruction
Backend:
python:3.12-slim-bookworm→python:3.13-slim-trixie- Debian 12 → Debian 13 "trixie"
- Python 3.12 → Python 3.13
- Added HEALTHCHECK instruction
How to Update
Docker Compose:
docker compose pull
docker compose up -dManual:
docker pull davidamacey/opentranscribe-frontend:v0.2.1
docker pull davidamacey/opentranscribe-backend:v0.2.1Full Changelog
See CHANGELOG.md for complete details.
🔒 Your security is our priority. Thank you for using OpenTranscribe.
v0.2.0 - Community-Driven Multilingual Release
We're thrilled to announce OpenTranscribe v0.2.0! This release is special because it marks our first major community-driven update, featuring contributions from real-world users who are actively using OpenTranscribe in production.
Growing Community
In just over a month since our v0.1.0 release, OpenTranscribe has seen exciting growth:
- 8 GitHub Stars - Thank you for the support!
- 7 Pull Requests from community contributor @SQLServerIO (Wes Brown)
- Critical feature request from @LaboratorioInternacionalWeb that shaped this release
Community Contributions
Wes Brown's Seven Pull Requests
A massive thank you to Wes Brown (@SQLServerIO) who submitted an incredible seven pull requests addressing real-world issues he encountered while using OpenTranscribe:
- PR #110: Pagination for large transcripts - Fixes page hanging with thousands of segments
- PR #107: Auto-cleanup garbage transcription segments
- PR #106: User admin endpoints now use UUID instead of integer ID
- PR #105: Speaker merge UI and segment speaker reassignment
- PR #104: LLM model discovery for OpenAI-compatible providers
- PR #103: Per-file speaker count settings in upload and reprocess UI
- PR #102: PyTorch 2.6+ compatibility and speaker diarization settings
The Multilingual Feature Request
Issue #99 from @LaboratorioInternacionalWeb highlighted a critical gap in our product: Spanish audio files were being transcribed to English because WhisperX was hardcoded with language="en" and task="translate".
What's New in v0.2.0
🌍 Multilingual Transcription Support (100+ Languages)
- Source Language: Auto-detect or specify the audio language (100+ languages supported)
- Translate to English: Toggle to translate non-English audio (default: OFF - keeps original language)
- LLM Output Language: Generate AI summaries in 12 different languages
- ~42 languages have word-level timestamp support via wav2vec2 alignment
- Settings are stored per-user in the database
🌐 UI Internationalization (7 Languages)
The UI is now available in:
- English (default)
- Spanish (Español)
- French (Français)
- German (Deutsch)
- Portuguese (Português)
- Chinese (中文)
- Japanese (日本語)
🎙️ Speaker Management Enhancements
- Speaker Merge UI: New visual interface to combine duplicate speakers with segment preview and reassignment
- Per-File Speaker Settings: Configure min/max speakers at upload or reprocess time
- User-Level Preferences: Save default speaker detection settings
🤖 LLM Integration Improvements
- Model Auto-Discovery: Automatic detection of available models for vLLM, Ollama, and Anthropic providers
- Anthropic Support Enhanced: Native model discovery via /v1/models API
- Multilingual Output: Generate AI summaries in 12 different languages
- Improved Configuration UX: Toast notifications, better API key handling, edit mode with stored keys
- Updated Default Models: Anthropic uses
claude-opus-4-5-20251101, Ollama usesllama3.2:latest
⚡ Performance & Stability
- Pagination for Large Transcripts: No more browser hanging with thousands of segments
- Auto-Cleanup Garbage Segments: Automatic detection and removal of erroneous transcription segments
- PyTorch 2.6+ Compatibility: Support for the latest PyTorch versions
- Backend Code Quality: Reduced cyclomatic complexity across 47 functions in 27 files
👤 Admin & User Experience
- System Statistics: CPU, memory, disk, and GPU usage now visible to all users
- Admin Password Reset: Secure password reset functionality with validation
- UUID Consistency: Fixed admin endpoints to use UUID instead of integer IDs
Upgrading to v0.2.0
# If using the production installer
cd opentranscribe
./opentranscribe.sh update
# Or pull the latest Docker images
docker compose pull
docker compose up -dDatabase migrations run automatically on startup - no manual intervention required.
Resources
- Documentation: docs.opentranscribe.app
- GitHub: github.com/davidamacey/OpenTranscribe
- Docker Hub: Backend | Frontend
- Blog Post: Full Release Notes
Full Changelog: v0.1.0...v0.2.0
Happy transcribing! 🎉
— The OpenTranscribe Team
OpenTranscribe v0.1.0 - First Official Release
OpenTranscribe v0.1.0 - First Official Release
Release Date: November 5, 2025
License: GNU Affero General Public License v3.0 (AGPL-3.0)
Overview
We're thrilled to announce the first official release of OpenTranscribe! After 6 months of intensive development starting in May 2025, what began as a weekend experiment has evolved into a production-ready, fully-featured AI transcription platform.
OpenTranscribe is a powerful, self-hosted AI-powered transcription and media analysis platform that combines state-of-the-art AI models with a modern web interface to provide high-accuracy transcription, speaker identification, AI summarization, and advanced search capabilities.
Why AGPL-3.0?
We've chosen the GNU Affero General Public License v3.0 to:
- Protect open source - Ensure the code remains open and accessible to everyone
- Prevent proprietary forks - Require that modifications, especially network services, remain open
- Ensure transparency - Network users have the right to access the source code
- Build community - Foster collaboration and shared improvements
Key Highlights
🎧 Professional-Grade Transcription
- 70x realtime speed on GPU with large-v2 model
- Word-level timestamps using WAV2VEC2 alignment
- 50+ languages supported with automatic translation
- Universal format support - Audio and video files up to 4GB
👥 Advanced Speaker Intelligence
- Automatic speaker diarization using PyAnnote.audio
- Cross-video speaker recognition with voice fingerprinting
- AI-powered speaker suggestions using LLM context analysis
- Global speaker profiles that persist across all recordings
- Speaker analytics with talk time, pace, and interaction patterns
🤖 AI-Powered Insights
- LLM integration - Support for OpenAI, Claude, vLLM, Ollama, OpenRouter, and custom providers
- BLUF format summaries - Bottom Line Up Front structured analysis
- Custom AI prompts - Unlimited prompts with flexible JSON schemas
- Intelligent sectioning - Handles transcripts of any length automatically
- Local or cloud processing - Privacy-first local models or powerful cloud AI
🔍 Powerful Search & Discovery
- Hybrid search - Keyword + semantic search with OpenSearch 3.3.1
- 9.5x faster vector search - Significantly improved performance
- 25% faster queries with 75% lower p90 latency
- Advanced filtering - Search by speaker, tags, collections, date, duration
- Interactive navigation - Click-to-seek on transcripts and waveforms
⚡ Enterprise Performance
- Multi-GPU scaling - Optional parallel processing (4+ workers per GPU)
- Specialized work queues - GPU, CPU, Download, NLP, and Utility queues
- Non-blocking architecture - Parallel processing saves 45-75s per 3-hour file
- Model caching - Efficient ~2.6GB cache with automatic persistence
- Complete offline support - Full airgapped deployment capability
Installation
Quick Install (Recommended)
curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash
cd opentranscribe
./opentranscribe.sh startAccess at: http://localhost:5173
Docker Hub Images
Pre-built multi-platform images (AMD64, ARM64):
davidamacey/opentranscribe-backend:v0.1.0davidamacey/opentranscribe-frontend:v0.1.0
From Source
git clone https://github.com/davidamacey/OpenTranscribe.git
cd OpenTranscribe
git checkout v0.1.0
cp .env.example .env
# Edit .env with your settings
./opentr.sh start devWhat's Included
Core Features
✅ Transcription - WhisperX with faster-whisper backend
✅ Speaker Diarization - PyAnnote.audio integration with auto-labeling and profile generation
✅ Media File Upload - Direct upload of audio/video files up to 4GB with drag-and-drop
✅ Video File Size Detection - Client-side audio extraction option for large video files
✅ YouTube Support - Direct URL and playlist processing for batch transcription
✅ Browser Microphone Recording - Built-in recording (localhost or HTTPS) with background operation
✅ AI-Powered Summaries - Multi-provider LLM integration with customizable formats
✅ AI Topic Generation - Automatic tag and collection suggestions from transcript content
✅ Timestamp Comments - User annotations anchored to specific video moments
✅ Search Engine - OpenSearch 3.3.1 with hybrid keyword and vector search
✅ Collections - Organize media into themed groups with AI suggestions
✅ Analytics - Speaker metrics and interaction analysis
✅ Waveform Visualization - Interactive audio timeline
✅ PWA Support - Installable progressive web app
✅ Dark/Light Mode - Full theme support
Infrastructure
✅ Docker Compose - Multi-environment orchestration
✅ PostgreSQL - Relational database with JSONB
✅ MinIO - S3-compatible object storage
✅ Redis - Message broker and caching
✅ Celery - Distributed task processing
✅ NGINX - Production web server
✅ Flower - Task monitoring dashboard
Security
✅ Non-root containers - Principle of least privilege
✅ RBAC - Role-based access control
✅ Encrypted secrets - Secure API key storage
✅ Security scanning - Trivy and Grype integration
✅ Session management - JWT-based authentication
System Requirements
Minimum
- CPU: 4 cores
- RAM: 8GB
- Storage: 50GB (including ~3GB for AI models)
- GPU: Optional (CPU-only mode available)
Recommended
- CPU: 8+ cores
- RAM: 16GB+
- Storage: 100GB+ SSD
- GPU: NVIDIA GPU with 8GB+ VRAM (RTX 3070 or better)
Supported Platforms
- OS: Linux, macOS (including Apple Silicon), Windows (via WSL2)
- Architectures: AMD64, ARM64
- GPUs: NVIDIA CUDA, Apple MPS (Metal)
Performance Benchmarks
| Metric | Performance |
|---|---|
| Transcription Speed (GPU) | 70x realtime |
| Vector Search Improvement | 9.5x faster |
| Query Performance | 25% faster, 75% lower p90 latency |
| Multi-GPU Throughput | 4 videos simultaneously (4 workers) |
| Model Cache Size | ~2.6GB total |
Documentation
📚 Complete Documentation: https://docs.opentranscribe.app
Key resources:
- Quick Start Guide
- Installation Guide
- User Guide
- Configuration Reference
- Screenshots & Visual Guide
- FAQ
- Troubleshooting
Roadmap to v1.0.0
We're committed to delivering a stable, production-ready v1.0.0 release. While we'll strive for backwards compatibility, we cannot guarantee it until v1.0.0. Breaking changes will be clearly announced.
Planned features for future releases:
- Real-time transcription for live streaming
- Enhanced speaker analytics and visualization
- Better speaker diarization models
- Google-style text search
- LLM powered RAG Chat with transcript text
- Other refinements along the way!
Known Issues
No critical issues at release time. See GitHub Issues for community-reported items.
Contributing
We welcome contributions from the community! See our Contributing Guide for details.
Ways to contribute:
- 🐛 Report bugs and issues
- 💡 Suggest new features
- 🔧 Submit pull requests
- 📚 Improve documentation
- 🌍 Translate the interface
- ⭐ Star the repository
Support & Community
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: Contact via GitHub
Acknowledgments
OpenTranscribe builds upon amazing open-source projects:
- OpenAI Whisper - Foundation speech recognition model
- WhisperX - Enhanced alignment and diarization
- PyAnnote.audio - Speaker diarization toolkit
- FastAPI - Modern Python web framework
- Svelte - Reactive frontend framework
- PostgreSQL - Reliable database system
- OpenSearch - Search and analytics engine
- Docker - Containerization platform
Special thanks to the AI community and all contributors who helped make this release possible!
License
OpenTranscribe is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
See LICENSE for full details.
Built with ❤️ by the OpenTranscribe community
OpenTranscribe demonstrates the power of AI-assisted development while maintaining full local control over your data and processing.
Download: v0.1.0 Release
Docker: Backend | Frontend
Docs: docs.opentranscribe.app