Releases: davidamacey/OpenTranscribe
v0.3.3 - Community Contributions & Protected Media Support
Community Contributions & Protected Media Support
Community-driven release featuring contributions from @vfilon, who submitted all four PRs in this version!
Highlights
- 🇷🇺 Russian Language Support - 8th supported UI language with 1,600+ translated strings
- 🔐 Protected Media Authentication - New plugin system for downloading from password-protected corporate video portals (MediaCMS support built-in)
- 🛠️ Bug Fixes - VRAM monitoring fix for non-CUDA devices, loading screen translation fix
- 🔧 URL Utilities - Centralized URL construction for consistent dev/production behavior
How to Update
Docker Compose:
docker compose pull
docker compose up -dProtected Media Setup (Optional)
To enable authenticated downloads from MediaCMS installations:
# Add to .env
MEDIACMS_ALLOWED_HOSTS=media.example.com,mediacms.internalFull Changelog
See CHANGELOG.md for complete details.
Thank You
Special thanks to @vfilon for contributing all four PRs in this release!
v0.3.2 - Setup Script Bug Fixes
Patch release fixing critical bugs in the one-liner installation script that prevented successful setup on fresh installations.
Note: This is a scripts-only release. No Docker container rebuild required.
Fixed
Setup Script Fixes
- Scripts Directory Creation - Fixed curl error 23 ("Failure writing output to destination") when downloading SSL and permission scripts by creating the
scripts/directory before download attempts - PyTorch 2.6+ Compatibility - Applied
torch.loadpatch todownload-models.pyfor PyTorch 2.6+ compatibility, mirroring the fix already present in the backend (from Wes Brown's commit 8929cd6)- PyTorch 2.6 changed
weights_onlydefault toTrue, causing omegaconf deserialization errors during model downloads - The patch sets
weights_only=Falsefor trusted HuggingFace models
- PyTorch 2.6 changed
Upgrade Notes
For existing installations: No action required - Docker containers already have the PyTorch fix.
For new installations: The one-liner setup script now works correctly:
curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bashFull Changelog: https://github.com/davidamacey/OpenTranscribe/blob/master/CHANGELOG.md
v0.3.1 - Script Enhancements & Documentation Updates
Script Enhancements & Documentation Updates
Patch release with enhanced setup scripts for HTTPS/SSL configuration and comprehensive documentation updates covering v0.2.0 and v0.3.0 features.
Highlights
New Management Commands
./opentranscribe.sh setup-ssl- Interactive HTTPS/SSL configuration./opentranscribe.sh version- Check current version and available updates./opentranscribe.sh update- Update containers only (quick)./opentranscribe.sh update-full- Update containers + config files (recommended)
NGINX Improvements
- Automatic NGINX overlay loading when
NGINX_SERVER_NAMEis configured - NGINX health check added to
./opentr.sh health
Documentation Updates
- New comprehensive NGINX/SSL setup guide
- Updated docs for Universal Media URL support (1800+ platforms)
- Added garbage cleanup feature documentation
- FAQ entries for system statistics and transcript pagination
- All Docusaurus and README docs updated for v0.2.0/v0.3.0 features
How to Update
Existing installations:
./opentranscribe.sh update-fullNew installations:
curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bashFull Changelog
See CHANGELOG.md
OpenTranscribe v0.3.0 - Universal Media URL & NGINX Support
What's New in v0.3.0
This release integrates valuable contributions from the community fork by @vfilon, bringing major new features including support for 1800+ video platforms and production-ready NGINX reverse proxy with SSL/TLS.
🎬 Universal Media URL Support (1800+ Platforms)
The headline feature expands OpenTranscribe far beyond YouTube:
Supported Platforms:
- Primary (Best Support): YouTube, Dailymotion, Twitter/X
- Secondary: Vimeo (public only), TikTok (variable), and 1800+ more via yt-dlp
Features:
- Dynamic source platform detection from yt-dlp metadata
- User-friendly error messages for authentication-required platforms
- Platform guidance for common issues (Vimeo login, Instagram restrictions, etc.)
- Updated UI with "Supported Platforms" section and limitations warning
Note: Authentication is not currently supported. Videos requiring login will fail with helpful error messages guiding users to publicly accessible alternatives.
🔐 NGINX Reverse Proxy with SSL/TLS
This release closes #72, enabling browser microphone recording on remote network access:
docker-compose.nginx.ymloverlay for production deployments- Full SSL/TLS configuration with HTTP → HTTPS redirect
- WebSocket proxy support for real-time updates
- 2GB file upload support for large media files
- Flower dashboard and MinIO console accessible through NGINX
- Self-signed certificate generation script
🔧 Critical Bug Fixes: UUID/ID Standardization
Comprehensive fix for UUID/ID handling across 60+ files:
Issues Fixed:
- Speaker recommendations not showing for new videos
- Profile embedding service returning wrong ID type
- Inconsistent ID handling between backend and frontend
- Comment system UUID issues
- Password reset flow problems
🏗️ Infrastructure Improvements
GPU Configuration:
- Separated into optional
docker-compose.gpu.ymloverlay - Better cross-platform support (macOS, CPU-only systems)
- Auto-detection in
opentr.shscript
Task Management:
- Task status reconciliation before marking files as stuck
- Multiple timestamp fallbacks for better reliability
- Auto-refresh analytics when segment speaker changes
LLM Service:
- Ollama context window configuration (
num_ctxparameter) - Model-aware temperature handling
- Better logging with resolved endpoint info
🌐 i18n Updates
All 7 supported languages have been updated:
- Notification text changed from "YouTube Processing" to "Video Processing"
- New media URL description and platform limitation strings
- Updated recommended platforms list
🙏 Acknowledgments
Special thanks to @vfilon for the fork contributions that made this release possible:
- Universal Media URL support concept
- NGINX reverse proxy configuration
- Task status reconciliation improvements
- GPU overlay separation
How to Update
Docker Compose (Recommended)
# Pull the latest images
docker compose pull
# Restart with new images
docker compose up -dFor NGINX/SSL Setup
# Set NGINX_SERVER_NAME in .env
./scripts/generate-ssl-cert.sh
./opentr.sh start prodSee docs/NGINX_SETUP.md for complete setup instructions.
Full Changelog
See the CHANGELOG for complete details.
Full Changelog: v0.2.1...v0.3.0
v0.2.1 - Security Patch Release
Security Patch Release
This release addresses critical container vulnerabilities identified in security scans. All users are encouraged to update.
Resolved Critical CVEs (4 → 0)
| CVE | Package | Severity | Status |
|---|---|---|---|
| CVE-2025-47917 | libmbedcrypto | CRITICAL | ✅ Fixed |
| CVE-2023-6879 | libaom3 | CRITICAL | ✅ Fixed |
| CVE-2025-7458 | libsqlite3 | CRITICAL | ✅ Fixed |
| CVE-2023-45853 | zlib | CRITICAL | ✅ Fixed |
Container Updates
Frontend:
nginx:1.29.3-alpine3.22→nginx:1.29.4-alpine3.23- Fixed 6 vulnerabilities (3 HIGH, 3 MEDIUM) in libpng and busybox
- Added HEALTHCHECK instruction
Backend:
python:3.12-slim-bookworm→python:3.13-slim-trixie- Debian 12 → Debian 13 "trixie"
- Python 3.12 → Python 3.13
- Added HEALTHCHECK instruction
How to Update
Docker Compose:
docker compose pull
docker compose up -dManual:
docker pull davidamacey/opentranscribe-frontend:v0.2.1
docker pull davidamacey/opentranscribe-backend:v0.2.1Full Changelog
See CHANGELOG.md for complete details.
🔒 Your security is our priority. Thank you for using OpenTranscribe.
v0.2.0 - Community-Driven Multilingual Release
We're thrilled to announce OpenTranscribe v0.2.0! This release is special because it marks our first major community-driven update, featuring contributions from real-world users who are actively using OpenTranscribe in production.
Growing Community
In just over a month since our v0.1.0 release, OpenTranscribe has seen exciting growth:
- 8 GitHub Stars - Thank you for the support!
- 7 Pull Requests from community contributor @SQLServerIO (Wes Brown)
- Critical feature request from @LaboratorioInternacionalWeb that shaped this release
Community Contributions
Wes Brown's Seven Pull Requests
A massive thank you to Wes Brown (@SQLServerIO) who submitted an incredible seven pull requests addressing real-world issues he encountered while using OpenTranscribe:
- PR #110: Pagination for large transcripts - Fixes page hanging with thousands of segments
- PR #107: Auto-cleanup garbage transcription segments
- PR #106: User admin endpoints now use UUID instead of integer ID
- PR #105: Speaker merge UI and segment speaker reassignment
- PR #104: LLM model discovery for OpenAI-compatible providers
- PR #103: Per-file speaker count settings in upload and reprocess UI
- PR #102: PyTorch 2.6+ compatibility and speaker diarization settings
The Multilingual Feature Request
Issue #99 from @LaboratorioInternacionalWeb highlighted a critical gap in our product: Spanish audio files were being transcribed to English because WhisperX was hardcoded with language="en" and task="translate".
What's New in v0.2.0
🌍 Multilingual Transcription Support (100+ Languages)
- Source Language: Auto-detect or specify the audio language (100+ languages supported)
- Translate to English: Toggle to translate non-English audio (default: OFF - keeps original language)
- LLM Output Language: Generate AI summaries in 12 different languages
- ~42 languages have word-level timestamp support via wav2vec2 alignment
- Settings are stored per-user in the database
🌐 UI Internationalization (7 Languages)
The UI is now available in:
- English (default)
- Spanish (Español)
- French (Français)
- German (Deutsch)
- Portuguese (Português)
- Chinese (中文)
- Japanese (日本語)
🎙️ Speaker Management Enhancements
- Speaker Merge UI: New visual interface to combine duplicate speakers with segment preview and reassignment
- Per-File Speaker Settings: Configure min/max speakers at upload or reprocess time
- User-Level Preferences: Save default speaker detection settings
🤖 LLM Integration Improvements
- Model Auto-Discovery: Automatic detection of available models for vLLM, Ollama, and Anthropic providers
- Anthropic Support Enhanced: Native model discovery via /v1/models API
- Multilingual Output: Generate AI summaries in 12 different languages
- Improved Configuration UX: Toast notifications, better API key handling, edit mode with stored keys
- Updated Default Models: Anthropic uses
claude-opus-4-5-20251101, Ollama usesllama3.2:latest
⚡ Performance & Stability
- Pagination for Large Transcripts: No more browser hanging with thousands of segments
- Auto-Cleanup Garbage Segments: Automatic detection and removal of erroneous transcription segments
- PyTorch 2.6+ Compatibility: Support for the latest PyTorch versions
- Backend Code Quality: Reduced cyclomatic complexity across 47 functions in 27 files
👤 Admin & User Experience
- System Statistics: CPU, memory, disk, and GPU usage now visible to all users
- Admin Password Reset: Secure password reset functionality with validation
- UUID Consistency: Fixed admin endpoints to use UUID instead of integer IDs
Upgrading to v0.2.0
# If using the production installer
cd opentranscribe
./opentranscribe.sh update
# Or pull the latest Docker images
docker compose pull
docker compose up -dDatabase migrations run automatically on startup - no manual intervention required.
Resources
- Documentation: docs.opentranscribe.app
- GitHub: github.com/davidamacey/OpenTranscribe
- Docker Hub: Backend | Frontend
- Blog Post: Full Release Notes
Full Changelog: v0.1.0...v0.2.0
Happy transcribing! 🎉
— The OpenTranscribe Team
OpenTranscribe v0.1.0 - First Official Release
OpenTranscribe v0.1.0 - First Official Release
Release Date: November 5, 2025
License: GNU Affero General Public License v3.0 (AGPL-3.0)
Overview
We're thrilled to announce the first official release of OpenTranscribe! After 6 months of intensive development starting in May 2025, what began as a weekend experiment has evolved into a production-ready, fully-featured AI transcription platform.
OpenTranscribe is a powerful, self-hosted AI-powered transcription and media analysis platform that combines state-of-the-art AI models with a modern web interface to provide high-accuracy transcription, speaker identification, AI summarization, and advanced search capabilities.
Why AGPL-3.0?
We've chosen the GNU Affero General Public License v3.0 to:
- Protect open source - Ensure the code remains open and accessible to everyone
- Prevent proprietary forks - Require that modifications, especially network services, remain open
- Ensure transparency - Network users have the right to access the source code
- Build community - Foster collaboration and shared improvements
Key Highlights
🎧 Professional-Grade Transcription
- 70x realtime speed on GPU with large-v2 model
- Word-level timestamps using WAV2VEC2 alignment
- 50+ languages supported with automatic translation
- Universal format support - Audio and video files up to 4GB
👥 Advanced Speaker Intelligence
- Automatic speaker diarization using PyAnnote.audio
- Cross-video speaker recognition with voice fingerprinting
- AI-powered speaker suggestions using LLM context analysis
- Global speaker profiles that persist across all recordings
- Speaker analytics with talk time, pace, and interaction patterns
🤖 AI-Powered Insights
- LLM integration - Support for OpenAI, Claude, vLLM, Ollama, OpenRouter, and custom providers
- BLUF format summaries - Bottom Line Up Front structured analysis
- Custom AI prompts - Unlimited prompts with flexible JSON schemas
- Intelligent sectioning - Handles transcripts of any length automatically
- Local or cloud processing - Privacy-first local models or powerful cloud AI
🔍 Powerful Search & Discovery
- Hybrid search - Keyword + semantic search with OpenSearch 3.3.1
- 9.5x faster vector search - Significantly improved performance
- 25% faster queries with 75% lower p90 latency
- Advanced filtering - Search by speaker, tags, collections, date, duration
- Interactive navigation - Click-to-seek on transcripts and waveforms
⚡ Enterprise Performance
- Multi-GPU scaling - Optional parallel processing (4+ workers per GPU)
- Specialized work queues - GPU, CPU, Download, NLP, and Utility queues
- Non-blocking architecture - Parallel processing saves 45-75s per 3-hour file
- Model caching - Efficient ~2.6GB cache with automatic persistence
- Complete offline support - Full airgapped deployment capability
Installation
Quick Install (Recommended)
curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash
cd opentranscribe
./opentranscribe.sh startAccess at: http://localhost:5173
Docker Hub Images
Pre-built multi-platform images (AMD64, ARM64):
davidamacey/opentranscribe-backend:v0.1.0davidamacey/opentranscribe-frontend:v0.1.0
From Source
git clone https://github.com/davidamacey/OpenTranscribe.git
cd OpenTranscribe
git checkout v0.1.0
cp .env.example .env
# Edit .env with your settings
./opentr.sh start devWhat's Included
Core Features
✅ Transcription - WhisperX with faster-whisper backend
✅ Speaker Diarization - PyAnnote.audio integration with auto-labeling and profile generation
✅ Media File Upload - Direct upload of audio/video files up to 4GB with drag-and-drop
✅ Video File Size Detection - Client-side audio extraction option for large video files
✅ YouTube Support - Direct URL and playlist processing for batch transcription
✅ Browser Microphone Recording - Built-in recording (localhost or HTTPS) with background operation
✅ AI-Powered Summaries - Multi-provider LLM integration with customizable formats
✅ AI Topic Generation - Automatic tag and collection suggestions from transcript content
✅ Timestamp Comments - User annotations anchored to specific video moments
✅ Search Engine - OpenSearch 3.3.1 with hybrid keyword and vector search
✅ Collections - Organize media into themed groups with AI suggestions
✅ Analytics - Speaker metrics and interaction analysis
✅ Waveform Visualization - Interactive audio timeline
✅ PWA Support - Installable progressive web app
✅ Dark/Light Mode - Full theme support
Infrastructure
✅ Docker Compose - Multi-environment orchestration
✅ PostgreSQL - Relational database with JSONB
✅ MinIO - S3-compatible object storage
✅ Redis - Message broker and caching
✅ Celery - Distributed task processing
✅ NGINX - Production web server
✅ Flower - Task monitoring dashboard
Security
✅ Non-root containers - Principle of least privilege
✅ RBAC - Role-based access control
✅ Encrypted secrets - Secure API key storage
✅ Security scanning - Trivy and Grype integration
✅ Session management - JWT-based authentication
System Requirements
Minimum
- CPU: 4 cores
- RAM: 8GB
- Storage: 50GB (including ~3GB for AI models)
- GPU: Optional (CPU-only mode available)
Recommended
- CPU: 8+ cores
- RAM: 16GB+
- Storage: 100GB+ SSD
- GPU: NVIDIA GPU with 8GB+ VRAM (RTX 3070 or better)
Supported Platforms
- OS: Linux, macOS (including Apple Silicon), Windows (via WSL2)
- Architectures: AMD64, ARM64
- GPUs: NVIDIA CUDA, Apple MPS (Metal)
Performance Benchmarks
| Metric | Performance |
|---|---|
| Transcription Speed (GPU) | 70x realtime |
| Vector Search Improvement | 9.5x faster |
| Query Performance | 25% faster, 75% lower p90 latency |
| Multi-GPU Throughput | 4 videos simultaneously (4 workers) |
| Model Cache Size | ~2.6GB total |
Documentation
📚 Complete Documentation: https://docs.opentranscribe.app
Key resources:
- Quick Start Guide
- Installation Guide
- User Guide
- Configuration Reference
- Screenshots & Visual Guide
- FAQ
- Troubleshooting
Roadmap to v1.0.0
We're committed to delivering a stable, production-ready v1.0.0 release. While we'll strive for backwards compatibility, we cannot guarantee it until v1.0.0. Breaking changes will be clearly announced.
Planned features for future releases:
- Real-time transcription for live streaming
- Enhanced speaker analytics and visualization
- Better speaker diarization models
- Google-style text search
- LLM powered RAG Chat with transcript text
- Other refinements along the way!
Known Issues
No critical issues at release time. See GitHub Issues for community-reported items.
Contributing
We welcome contributions from the community! See our Contributing Guide for details.
Ways to contribute:
- 🐛 Report bugs and issues
- 💡 Suggest new features
- 🔧 Submit pull requests
- 📚 Improve documentation
- 🌍 Translate the interface
- ⭐ Star the repository
Support & Community
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: Contact via GitHub
Acknowledgments
OpenTranscribe builds upon amazing open-source projects:
- OpenAI Whisper - Foundation speech recognition model
- WhisperX - Enhanced alignment and diarization
- PyAnnote.audio - Speaker diarization toolkit
- FastAPI - Modern Python web framework
- Svelte - Reactive frontend framework
- PostgreSQL - Reliable database system
- OpenSearch - Search and analytics engine
- Docker - Containerization platform
Special thanks to the AI community and all contributors who helped make this release possible!
License
OpenTranscribe is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
See LICENSE for full details.
Built with ❤️ by the OpenTranscribe community
OpenTranscribe demonstrates the power of AI-assisted development while maintaining full local control over your data and processing.
Download: v0.1.0 Release
Docker: Backend | Frontend
Docs: docs.opentranscribe.app