Skip to content

Releases: davidamacey/OpenTranscribe

v0.3.3 - Community Contributions & Protected Media Support

14 Jan 05:13

Choose a tag to compare

Community Contributions & Protected Media Support

Community-driven release featuring contributions from @vfilon, who submitted all four PRs in this version!

Highlights

  • 🇷🇺 Russian Language Support - 8th supported UI language with 1,600+ translated strings
  • 🔐 Protected Media Authentication - New plugin system for downloading from password-protected corporate video portals (MediaCMS support built-in)
  • 🛠️ Bug Fixes - VRAM monitoring fix for non-CUDA devices, loading screen translation fix
  • 🔧 URL Utilities - Centralized URL construction for consistent dev/production behavior

How to Update

Docker Compose:

docker compose pull
docker compose up -d

Protected Media Setup (Optional)

To enable authenticated downloads from MediaCMS installations:

# Add to .env
MEDIACMS_ALLOWED_HOSTS=media.example.com,mediacms.internal

Full Changelog

See CHANGELOG.md for complete details.

Thank You

Special thanks to @vfilon for contributing all four PRs in this release!

v0.3.2 - Setup Script Bug Fixes

17 Dec 01:42

Choose a tag to compare

Patch release fixing critical bugs in the one-liner installation script that prevented successful setup on fresh installations.

Note: This is a scripts-only release. No Docker container rebuild required.

Fixed

Setup Script Fixes

  • Scripts Directory Creation - Fixed curl error 23 ("Failure writing output to destination") when downloading SSL and permission scripts by creating the scripts/ directory before download attempts
  • PyTorch 2.6+ Compatibility - Applied torch.load patch to download-models.py for PyTorch 2.6+ compatibility, mirroring the fix already present in the backend (from Wes Brown's commit 8929cd6)
    • PyTorch 2.6 changed weights_only default to True, causing omegaconf deserialization errors during model downloads
    • The patch sets weights_only=False for trusted HuggingFace models

Upgrade Notes

For existing installations: No action required - Docker containers already have the PyTorch fix.

For new installations: The one-liner setup script now works correctly:

curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash

Full Changelog: https://github.com/davidamacey/OpenTranscribe/blob/master/CHANGELOG.md

v0.3.1 - Script Enhancements & Documentation Updates

16 Dec 13:49

Choose a tag to compare

Script Enhancements & Documentation Updates

Patch release with enhanced setup scripts for HTTPS/SSL configuration and comprehensive documentation updates covering v0.2.0 and v0.3.0 features.

Highlights

New Management Commands

  • ./opentranscribe.sh setup-ssl - Interactive HTTPS/SSL configuration
  • ./opentranscribe.sh version - Check current version and available updates
  • ./opentranscribe.sh update - Update containers only (quick)
  • ./opentranscribe.sh update-full - Update containers + config files (recommended)

NGINX Improvements

  • Automatic NGINX overlay loading when NGINX_SERVER_NAME is configured
  • NGINX health check added to ./opentr.sh health

Documentation Updates

  • New comprehensive NGINX/SSL setup guide
  • Updated docs for Universal Media URL support (1800+ platforms)
  • Added garbage cleanup feature documentation
  • FAQ entries for system statistics and transcript pagination
  • All Docusaurus and README docs updated for v0.2.0/v0.3.0 features

How to Update

Existing installations:

./opentranscribe.sh update-full

New installations:

curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash

Full Changelog

See CHANGELOG.md

OpenTranscribe v0.3.0 - Universal Media URL & NGINX Support

15 Dec 12:24

Choose a tag to compare

What's New in v0.3.0

This release integrates valuable contributions from the community fork by @vfilon, bringing major new features including support for 1800+ video platforms and production-ready NGINX reverse proxy with SSL/TLS.

🎬 Universal Media URL Support (1800+ Platforms)

The headline feature expands OpenTranscribe far beyond YouTube:

Supported Platforms:

  • Primary (Best Support): YouTube, Dailymotion, Twitter/X
  • Secondary: Vimeo (public only), TikTok (variable), and 1800+ more via yt-dlp

Features:

  • Dynamic source platform detection from yt-dlp metadata
  • User-friendly error messages for authentication-required platforms
  • Platform guidance for common issues (Vimeo login, Instagram restrictions, etc.)
  • Updated UI with "Supported Platforms" section and limitations warning

Note: Authentication is not currently supported. Videos requiring login will fail with helpful error messages guiding users to publicly accessible alternatives.

🔐 NGINX Reverse Proxy with SSL/TLS

This release closes #72, enabling browser microphone recording on remote network access:

  • docker-compose.nginx.yml overlay for production deployments
  • Full SSL/TLS configuration with HTTP → HTTPS redirect
  • WebSocket proxy support for real-time updates
  • 2GB file upload support for large media files
  • Flower dashboard and MinIO console accessible through NGINX
  • Self-signed certificate generation script

🔧 Critical Bug Fixes: UUID/ID Standardization

Comprehensive fix for UUID/ID handling across 60+ files:

Issues Fixed:

  • Speaker recommendations not showing for new videos
  • Profile embedding service returning wrong ID type
  • Inconsistent ID handling between backend and frontend
  • Comment system UUID issues
  • Password reset flow problems

🏗️ Infrastructure Improvements

GPU Configuration:

  • Separated into optional docker-compose.gpu.yml overlay
  • Better cross-platform support (macOS, CPU-only systems)
  • Auto-detection in opentr.sh script

Task Management:

  • Task status reconciliation before marking files as stuck
  • Multiple timestamp fallbacks for better reliability
  • Auto-refresh analytics when segment speaker changes

LLM Service:

  • Ollama context window configuration (num_ctx parameter)
  • Model-aware temperature handling
  • Better logging with resolved endpoint info

🌐 i18n Updates

All 7 supported languages have been updated:

  • Notification text changed from "YouTube Processing" to "Video Processing"
  • New media URL description and platform limitation strings
  • Updated recommended platforms list

🙏 Acknowledgments

Special thanks to @vfilon for the fork contributions that made this release possible:

  • Universal Media URL support concept
  • NGINX reverse proxy configuration
  • Task status reconciliation improvements
  • GPU overlay separation

How to Update

Docker Compose (Recommended)

# Pull the latest images
docker compose pull

# Restart with new images
docker compose up -d

For NGINX/SSL Setup

# Set NGINX_SERVER_NAME in .env
./scripts/generate-ssl-cert.sh
./opentr.sh start prod

See docs/NGINX_SETUP.md for complete setup instructions.

Full Changelog

See the CHANGELOG for complete details.


Full Changelog: v0.2.1...v0.3.0

v0.2.1 - Security Patch Release

13 Dec 14:55

Choose a tag to compare

Security Patch Release

This release addresses critical container vulnerabilities identified in security scans. All users are encouraged to update.

Resolved Critical CVEs (4 → 0)

CVE Package Severity Status
CVE-2025-47917 libmbedcrypto CRITICAL ✅ Fixed
CVE-2023-6879 libaom3 CRITICAL ✅ Fixed
CVE-2025-7458 libsqlite3 CRITICAL ✅ Fixed
CVE-2023-45853 zlib CRITICAL ✅ Fixed

Container Updates

Frontend:

  • nginx:1.29.3-alpine3.22nginx:1.29.4-alpine3.23
  • Fixed 6 vulnerabilities (3 HIGH, 3 MEDIUM) in libpng and busybox
  • Added HEALTHCHECK instruction

Backend:

  • python:3.12-slim-bookwormpython:3.13-slim-trixie
  • Debian 12 → Debian 13 "trixie"
  • Python 3.12 → Python 3.13
  • Added HEALTHCHECK instruction

How to Update

Docker Compose:

docker compose pull
docker compose up -d

Manual:

docker pull davidamacey/opentranscribe-frontend:v0.2.1
docker pull davidamacey/opentranscribe-backend:v0.2.1

Full Changelog

See CHANGELOG.md for complete details.


🔒 Your security is our priority. Thank you for using OpenTranscribe.

v0.2.0 - Community-Driven Multilingual Release

13 Dec 00:43
8851626

Choose a tag to compare

We're thrilled to announce OpenTranscribe v0.2.0! This release is special because it marks our first major community-driven update, featuring contributions from real-world users who are actively using OpenTranscribe in production.

Growing Community

In just over a month since our v0.1.0 release, OpenTranscribe has seen exciting growth:

Community Contributions

Wes Brown's Seven Pull Requests

A massive thank you to Wes Brown (@SQLServerIO) who submitted an incredible seven pull requests addressing real-world issues he encountered while using OpenTranscribe:

  1. PR #110: Pagination for large transcripts - Fixes page hanging with thousands of segments
  2. PR #107: Auto-cleanup garbage transcription segments
  3. PR #106: User admin endpoints now use UUID instead of integer ID
  4. PR #105: Speaker merge UI and segment speaker reassignment
  5. PR #104: LLM model discovery for OpenAI-compatible providers
  6. PR #103: Per-file speaker count settings in upload and reprocess UI
  7. PR #102: PyTorch 2.6+ compatibility and speaker diarization settings

The Multilingual Feature Request

Issue #99 from @LaboratorioInternacionalWeb highlighted a critical gap in our product: Spanish audio files were being transcribed to English because WhisperX was hardcoded with language="en" and task="translate".

What's New in v0.2.0

🌍 Multilingual Transcription Support (100+ Languages)

  • Source Language: Auto-detect or specify the audio language (100+ languages supported)
  • Translate to English: Toggle to translate non-English audio (default: OFF - keeps original language)
  • LLM Output Language: Generate AI summaries in 12 different languages
  • ~42 languages have word-level timestamp support via wav2vec2 alignment
  • Settings are stored per-user in the database

🌐 UI Internationalization (7 Languages)

The UI is now available in:

  • English (default)
  • Spanish (Español)
  • French (Français)
  • German (Deutsch)
  • Portuguese (Português)
  • Chinese (中文)
  • Japanese (日本語)

🎙️ Speaker Management Enhancements

  • Speaker Merge UI: New visual interface to combine duplicate speakers with segment preview and reassignment
  • Per-File Speaker Settings: Configure min/max speakers at upload or reprocess time
  • User-Level Preferences: Save default speaker detection settings

🤖 LLM Integration Improvements

  • Model Auto-Discovery: Automatic detection of available models for vLLM, Ollama, and Anthropic providers
  • Anthropic Support Enhanced: Native model discovery via /v1/models API
  • Multilingual Output: Generate AI summaries in 12 different languages
  • Improved Configuration UX: Toast notifications, better API key handling, edit mode with stored keys
  • Updated Default Models: Anthropic uses claude-opus-4-5-20251101, Ollama uses llama3.2:latest

⚡ Performance & Stability

  • Pagination for Large Transcripts: No more browser hanging with thousands of segments
  • Auto-Cleanup Garbage Segments: Automatic detection and removal of erroneous transcription segments
  • PyTorch 2.6+ Compatibility: Support for the latest PyTorch versions
  • Backend Code Quality: Reduced cyclomatic complexity across 47 functions in 27 files

👤 Admin & User Experience

  • System Statistics: CPU, memory, disk, and GPU usage now visible to all users
  • Admin Password Reset: Secure password reset functionality with validation
  • UUID Consistency: Fixed admin endpoints to use UUID instead of integer IDs

Upgrading to v0.2.0

# If using the production installer
cd opentranscribe
./opentranscribe.sh update

# Or pull the latest Docker images
docker compose pull
docker compose up -d

Database migrations run automatically on startup - no manual intervention required.

Resources


Full Changelog: v0.1.0...v0.2.0

Happy transcribing! 🎉
The OpenTranscribe Team

OpenTranscribe v0.1.0 - First Official Release

06 Nov 05:05

Choose a tag to compare

OpenTranscribe v0.1.0 - First Official Release

Release Date: November 5, 2025
License: GNU Affero General Public License v3.0 (AGPL-3.0)

Overview

We're thrilled to announce the first official release of OpenTranscribe! After 6 months of intensive development starting in May 2025, what began as a weekend experiment has evolved into a production-ready, fully-featured AI transcription platform.

OpenTranscribe is a powerful, self-hosted AI-powered transcription and media analysis platform that combines state-of-the-art AI models with a modern web interface to provide high-accuracy transcription, speaker identification, AI summarization, and advanced search capabilities.

Why AGPL-3.0?

We've chosen the GNU Affero General Public License v3.0 to:

  • Protect open source - Ensure the code remains open and accessible to everyone
  • Prevent proprietary forks - Require that modifications, especially network services, remain open
  • Ensure transparency - Network users have the right to access the source code
  • Build community - Foster collaboration and shared improvements

Key Highlights

🎧 Professional-Grade Transcription

  • 70x realtime speed on GPU with large-v2 model
  • Word-level timestamps using WAV2VEC2 alignment
  • 50+ languages supported with automatic translation
  • Universal format support - Audio and video files up to 4GB

👥 Advanced Speaker Intelligence

  • Automatic speaker diarization using PyAnnote.audio
  • Cross-video speaker recognition with voice fingerprinting
  • AI-powered speaker suggestions using LLM context analysis
  • Global speaker profiles that persist across all recordings
  • Speaker analytics with talk time, pace, and interaction patterns

🤖 AI-Powered Insights

  • LLM integration - Support for OpenAI, Claude, vLLM, Ollama, OpenRouter, and custom providers
  • BLUF format summaries - Bottom Line Up Front structured analysis
  • Custom AI prompts - Unlimited prompts with flexible JSON schemas
  • Intelligent sectioning - Handles transcripts of any length automatically
  • Local or cloud processing - Privacy-first local models or powerful cloud AI

🔍 Powerful Search & Discovery

  • Hybrid search - Keyword + semantic search with OpenSearch 3.3.1
  • 9.5x faster vector search - Significantly improved performance
  • 25% faster queries with 75% lower p90 latency
  • Advanced filtering - Search by speaker, tags, collections, date, duration
  • Interactive navigation - Click-to-seek on transcripts and waveforms

⚡ Enterprise Performance

  • Multi-GPU scaling - Optional parallel processing (4+ workers per GPU)
  • Specialized work queues - GPU, CPU, Download, NLP, and Utility queues
  • Non-blocking architecture - Parallel processing saves 45-75s per 3-hour file
  • Model caching - Efficient ~2.6GB cache with automatic persistence
  • Complete offline support - Full airgapped deployment capability

Installation

Quick Install (Recommended)

curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash
cd opentranscribe
./opentranscribe.sh start

Access at: http://localhost:5173

Docker Hub Images

Pre-built multi-platform images (AMD64, ARM64):

  • davidamacey/opentranscribe-backend:v0.1.0
  • davidamacey/opentranscribe-frontend:v0.1.0

From Source

git clone https://github.com/davidamacey/OpenTranscribe.git
cd OpenTranscribe
git checkout v0.1.0
cp .env.example .env
# Edit .env with your settings
./opentr.sh start dev

What's Included

Core Features

Transcription - WhisperX with faster-whisper backend
Speaker Diarization - PyAnnote.audio integration with auto-labeling and profile generation
Media File Upload - Direct upload of audio/video files up to 4GB with drag-and-drop
Video File Size Detection - Client-side audio extraction option for large video files
YouTube Support - Direct URL and playlist processing for batch transcription
Browser Microphone Recording - Built-in recording (localhost or HTTPS) with background operation
AI-Powered Summaries - Multi-provider LLM integration with customizable formats
AI Topic Generation - Automatic tag and collection suggestions from transcript content
Timestamp Comments - User annotations anchored to specific video moments
Search Engine - OpenSearch 3.3.1 with hybrid keyword and vector search
Collections - Organize media into themed groups with AI suggestions
Analytics - Speaker metrics and interaction analysis
Waveform Visualization - Interactive audio timeline
PWA Support - Installable progressive web app
Dark/Light Mode - Full theme support

Infrastructure

Docker Compose - Multi-environment orchestration
PostgreSQL - Relational database with JSONB
MinIO - S3-compatible object storage
Redis - Message broker and caching
Celery - Distributed task processing
NGINX - Production web server
Flower - Task monitoring dashboard

Security

Non-root containers - Principle of least privilege
RBAC - Role-based access control
Encrypted secrets - Secure API key storage
Security scanning - Trivy and Grype integration
Session management - JWT-based authentication

System Requirements

Minimum

  • CPU: 4 cores
  • RAM: 8GB
  • Storage: 50GB (including ~3GB for AI models)
  • GPU: Optional (CPU-only mode available)

Recommended

  • CPU: 8+ cores
  • RAM: 16GB+
  • Storage: 100GB+ SSD
  • GPU: NVIDIA GPU with 8GB+ VRAM (RTX 3070 or better)

Supported Platforms

  • OS: Linux, macOS (including Apple Silicon), Windows (via WSL2)
  • Architectures: AMD64, ARM64
  • GPUs: NVIDIA CUDA, Apple MPS (Metal)

Performance Benchmarks

Metric Performance
Transcription Speed (GPU) 70x realtime
Vector Search Improvement 9.5x faster
Query Performance 25% faster, 75% lower p90 latency
Multi-GPU Throughput 4 videos simultaneously (4 workers)
Model Cache Size ~2.6GB total

Documentation

📚 Complete Documentation: https://docs.opentranscribe.app

Key resources:

Roadmap to v1.0.0

We're committed to delivering a stable, production-ready v1.0.0 release. While we'll strive for backwards compatibility, we cannot guarantee it until v1.0.0. Breaking changes will be clearly announced.

Planned features for future releases:

  • Real-time transcription for live streaming
  • Enhanced speaker analytics and visualization
  • Better speaker diarization models
  • Google-style text search
  • LLM powered RAG Chat with transcript text
  • Other refinements along the way!

Known Issues

No critical issues at release time. See GitHub Issues for community-reported items.

Contributing

We welcome contributions from the community! See our Contributing Guide for details.

Ways to contribute:

  • 🐛 Report bugs and issues
  • 💡 Suggest new features
  • 🔧 Submit pull requests
  • 📚 Improve documentation
  • 🌍 Translate the interface
  • ⭐ Star the repository

Support & Community

Acknowledgments

OpenTranscribe builds upon amazing open-source projects:

  • OpenAI Whisper - Foundation speech recognition model
  • WhisperX - Enhanced alignment and diarization
  • PyAnnote.audio - Speaker diarization toolkit
  • FastAPI - Modern Python web framework
  • Svelte - Reactive frontend framework
  • PostgreSQL - Reliable database system
  • OpenSearch - Search and analytics engine
  • Docker - Containerization platform

Special thanks to the AI community and all contributors who helped make this release possible!

License

OpenTranscribe is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

See LICENSE for full details.


Built with ❤️ by the OpenTranscribe community

OpenTranscribe demonstrates the power of AI-assisted development while maintaining full local control over your data and processing.

Download: v0.1.0 Release
Docker: Backend | Frontend
Docs: docs.opentranscribe.app