Releases · davidamacey/OpenTranscribe

14 Jan 05:13

davidamacey

v0.3.3

cae684a

v0.3.3 - Community Contributions & Protected Media Support Latest

Latest

Community Contributions & Protected Media Support

Community-driven release featuring contributions from @vfilon, who submitted all four PRs in this version!

Highlights

🇷🇺 Russian Language Support - 8th supported UI language with 1,600+ translated strings
🔐 Protected Media Authentication - New plugin system for downloading from password-protected corporate video portals (MediaCMS support built-in)
🛠️ Bug Fixes - VRAM monitoring fix for non-CUDA devices, loading screen translation fix
🔧 URL Utilities - Centralized URL construction for consistent dev/production behavior

How to Update

Docker Compose:

docker compose pull
docker compose up -d

Protected Media Setup (Optional)

To enable authenticated downloads from MediaCMS installations:

# Add to .env
MEDIACMS_ALLOWED_HOSTS=media.example.com,mediacms.internal

Full Changelog

See CHANGELOG.md for complete details.

Thank You

Special thanks to @vfilon for contributing all four PRs in this release!

Assets 2

17 Dec 01:42

davidamacey

v0.3.2

23f39ce

v0.3.2 - Setup Script Bug Fixes

Patch release fixing critical bugs in the one-liner installation script that prevented successful setup on fresh installations.

Note: This is a scripts-only release. No Docker container rebuild required.

Fixed

Setup Script Fixes

Scripts Directory Creation - Fixed curl error 23 ("Failure writing output to destination") when downloading SSL and permission scripts by creating the scripts/ directory before download attempts
PyTorch 2.6+ Compatibility - Applied torch.load patch to download-models.py for PyTorch 2.6+ compatibility, mirroring the fix already present in the backend (from Wes Brown's commit 8929cd6)
- PyTorch 2.6 changed weights_only default to True, causing omegaconf deserialization errors during model downloads
- The patch sets weights_only=False for trusted HuggingFace models

Upgrade Notes

For existing installations: No action required - Docker containers already have the PyTorch fix.

For new installations: The one-liner setup script now works correctly:

curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash

Full Changelog: https://github.com/davidamacey/OpenTranscribe/blob/master/CHANGELOG.md

Assets 2

16 Dec 13:49

davidamacey

v0.3.1

e96fe52

v0.3.1 - Script Enhancements & Documentation Updates

Script Enhancements & Documentation Updates

Patch release with enhanced setup scripts for HTTPS/SSL configuration and comprehensive documentation updates covering v0.2.0 and v0.3.0 features.

Highlights

New Management Commands

./opentranscribe.sh setup-ssl - Interactive HTTPS/SSL configuration
./opentranscribe.sh version - Check current version and available updates
./opentranscribe.sh update - Update containers only (quick)
./opentranscribe.sh update-full - Update containers + config files (recommended)

NGINX Improvements

Automatic NGINX overlay loading when NGINX_SERVER_NAME is configured
NGINX health check added to ./opentr.sh health

Documentation Updates

New comprehensive NGINX/SSL setup guide
Updated docs for Universal Media URL support (1800+ platforms)
Added garbage cleanup feature documentation
FAQ entries for system statistics and transcript pagination
All Docusaurus and README docs updated for v0.2.0/v0.3.0 features

How to Update

Existing installations:

./opentranscribe.sh update-full

New installations:

curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash

Full Changelog

See CHANGELOG.md

Assets 2

15 Dec 12:24

davidamacey

v0.3.0

34b6a0f

OpenTranscribe v0.3.0 - Universal Media URL & NGINX Support

What's New in v0.3.0

This release integrates valuable contributions from the community fork by @vfilon, bringing major new features including support for 1800+ video platforms and production-ready NGINX reverse proxy with SSL/TLS.

🎬 Universal Media URL Support (1800+ Platforms)

The headline feature expands OpenTranscribe far beyond YouTube:

Supported Platforms:

Primary (Best Support): YouTube, Dailymotion, Twitter/X
Secondary: Vimeo (public only), TikTok (variable), and 1800+ more via yt-dlp

Features:

Dynamic source platform detection from yt-dlp metadata
User-friendly error messages for authentication-required platforms
Platform guidance for common issues (Vimeo login, Instagram restrictions, etc.)
Updated UI with "Supported Platforms" section and limitations warning

Note: Authentication is not currently supported. Videos requiring login will fail with helpful error messages guiding users to publicly accessible alternatives.

🔐 NGINX Reverse Proxy with SSL/TLS

This release closes #72, enabling browser microphone recording on remote network access:

docker-compose.nginx.yml overlay for production deployments
Full SSL/TLS configuration with HTTP → HTTPS redirect
WebSocket proxy support for real-time updates
2GB file upload support for large media files
Flower dashboard and MinIO console accessible through NGINX
Self-signed certificate generation script

🔧 Critical Bug Fixes: UUID/ID Standardization

Comprehensive fix for UUID/ID handling across 60+ files:

Issues Fixed:

Speaker recommendations not showing for new videos
Profile embedding service returning wrong ID type
Inconsistent ID handling between backend and frontend
Comment system UUID issues
Password reset flow problems

🏗️ Infrastructure Improvements

GPU Configuration:

Separated into optional docker-compose.gpu.yml overlay
Better cross-platform support (macOS, CPU-only systems)
Auto-detection in opentr.sh script

Task Management:

Task status reconciliation before marking files as stuck
Multiple timestamp fallbacks for better reliability
Auto-refresh analytics when segment speaker changes

LLM Service:

Ollama context window configuration (num_ctx parameter)
Model-aware temperature handling
Better logging with resolved endpoint info

🌐 i18n Updates

All 7 supported languages have been updated:

Notification text changed from "YouTube Processing" to "Video Processing"
New media URL description and platform limitation strings
Updated recommended platforms list

🙏 Acknowledgments

Special thanks to @vfilon for the fork contributions that made this release possible:

Universal Media URL support concept
NGINX reverse proxy configuration
Task status reconciliation improvements
GPU overlay separation

How to Update

Docker Compose (Recommended)

# Pull the latest images
docker compose pull

# Restart with new images
docker compose up -d

For NGINX/SSL Setup

# Set NGINX_SERVER_NAME in .env
./scripts/generate-ssl-cert.sh
./opentr.sh start prod

See docs/NGINX_SETUP.md for complete setup instructions.

Full Changelog

See the CHANGELOG for complete details.

Full Changelog: v0.2.1...v0.3.0

Assets 2

13 Dec 14:55

davidamacey

v0.2.1

6172cda

v0.2.1 - Security Patch Release

Security Patch Release

This release addresses critical container vulnerabilities identified in security scans. All users are encouraged to update.

Resolved Critical CVEs (4 → 0)

CVE	Package	Severity	Status
CVE-2025-47917	libmbedcrypto	CRITICAL	✅ Fixed
CVE-2023-6879	libaom3	CRITICAL	✅ Fixed
CVE-2025-7458	libsqlite3	CRITICAL	✅ Fixed
CVE-2023-45853	zlib	CRITICAL	✅ Fixed

Container Updates

Frontend:

nginx:1.29.3-alpine3.22 → nginx:1.29.4-alpine3.23
Fixed 6 vulnerabilities (3 HIGH, 3 MEDIUM) in libpng and busybox
Added HEALTHCHECK instruction

Backend:

python:3.12-slim-bookworm → python:3.13-slim-trixie
Debian 12 → Debian 13 "trixie"
Python 3.12 → Python 3.13
Added HEALTHCHECK instruction

How to Update

Docker Compose:

docker compose pull
docker compose up -d

Manual:

docker pull davidamacey/opentranscribe-frontend:v0.2.1
docker pull davidamacey/opentranscribe-backend:v0.2.1

Full Changelog

See CHANGELOG.md for complete details.

🔒 Your security is our priority. Thank you for using OpenTranscribe.

Assets 2

13 Dec 00:43

davidamacey

v0.2.0

8851626

v0.2.0 - Community-Driven Multilingual Release

We're thrilled to announce OpenTranscribe v0.2.0! This release is special because it marks our first major community-driven update, featuring contributions from real-world users who are actively using OpenTranscribe in production.

Growing Community

In just over a month since our v0.1.0 release, OpenTranscribe has seen exciting growth:

8 GitHub Stars - Thank you for the support!
7 Pull Requests from community contributor @SQLServerIO (Wes Brown)
Critical feature request from @LaboratorioInternacionalWeb that shaped this release

Community Contributions

Wes Brown's Seven Pull Requests

A massive thank you to Wes Brown (@SQLServerIO) who submitted an incredible seven pull requests addressing real-world issues he encountered while using OpenTranscribe:

PR #110: Pagination for large transcripts - Fixes page hanging with thousands of segments
PR #107: Auto-cleanup garbage transcription segments
PR #106: User admin endpoints now use UUID instead of integer ID
PR #105: Speaker merge UI and segment speaker reassignment
PR #104: LLM model discovery for OpenAI-compatible providers
PR #103: Per-file speaker count settings in upload and reprocess UI
PR #102: PyTorch 2.6+ compatibility and speaker diarization settings

The Multilingual Feature Request

Issue #99 from @LaboratorioInternacionalWeb highlighted a critical gap in our product: Spanish audio files were being transcribed to English because WhisperX was hardcoded with language="en" and task="translate".

What's New in v0.2.0

🌍 Multilingual Transcription Support (100+ Languages)

Source Language: Auto-detect or specify the audio language (100+ languages supported)
Translate to English: Toggle to translate non-English audio (default: OFF - keeps original language)
LLM Output Language: Generate AI summaries in 12 different languages
~42 languages have word-level timestamp support via wav2vec2 alignment
Settings are stored per-user in the database

🌐 UI Internationalization (7 Languages)

The UI is now available in:

English (default)
Spanish (Español)
French (Français)
German (Deutsch)
Portuguese (Português)
Chinese (中文)
Japanese (日本語)

🎙️ Speaker Management Enhancements

Speaker Merge UI: New visual interface to combine duplicate speakers with segment preview and reassignment
Per-File Speaker Settings: Configure min/max speakers at upload or reprocess time
User-Level Preferences: Save default speaker detection settings

🤖 LLM Integration Improvements

Model Auto-Discovery: Automatic detection of available models for vLLM, Ollama, and Anthropic providers
Anthropic Support Enhanced: Native model discovery via /v1/models API
Multilingual Output: Generate AI summaries in 12 different languages
Improved Configuration UX: Toast notifications, better API key handling, edit mode with stored keys
Updated Default Models: Anthropic uses claude-opus-4-5-20251101, Ollama uses llama3.2:latest

⚡ Performance & Stability

Pagination for Large Transcripts: No more browser hanging with thousands of segments
Auto-Cleanup Garbage Segments: Automatic detection and removal of erroneous transcription segments
PyTorch 2.6+ Compatibility: Support for the latest PyTorch versions
Backend Code Quality: Reduced cyclomatic complexity across 47 functions in 27 files

👤 Admin & User Experience

System Statistics: CPU, memory, disk, and GPU usage now visible to all users
Admin Password Reset: Secure password reset functionality with validation
UUID Consistency: Fixed admin endpoints to use UUID instead of integer IDs

Upgrading to v0.2.0

# If using the production installer
cd opentranscribe
./opentranscribe.sh update

# Or pull the latest Docker images
docker compose pull
docker compose up -d

Database migrations run automatically on startup - no manual intervention required.

Resources

Documentation: docs.opentranscribe.app
GitHub: github.com/davidamacey/OpenTranscribe
Docker Hub: Backend | Frontend
Blog Post: Full Release Notes

Full Changelog: v0.1.0...v0.2.0

Happy transcribing! 🎉
— The OpenTranscribe Team

Assets 2

06 Nov 05:05

davidamacey

v0.1.0

1cbb8e5

OpenTranscribe v0.1.0 - First Official Release

Release Date: November 5, 2025
License: GNU Affero General Public License v3.0 (AGPL-3.0)

Overview

We're thrilled to announce the first official release of OpenTranscribe! After 6 months of intensive development starting in May 2025, what began as a weekend experiment has evolved into a production-ready, fully-featured AI transcription platform.

OpenTranscribe is a powerful, self-hosted AI-powered transcription and media analysis platform that combines state-of-the-art AI models with a modern web interface to provide high-accuracy transcription, speaker identification, AI summarization, and advanced search capabilities.

Why AGPL-3.0?

We've chosen the GNU Affero General Public License v3.0 to:

Protect open source - Ensure the code remains open and accessible to everyone
Prevent proprietary forks - Require that modifications, especially network services, remain open
Ensure transparency - Network users have the right to access the source code
Build community - Foster collaboration and shared improvements

Key Highlights

🎧 Professional-Grade Transcription

70x realtime speed on GPU with large-v2 model
Word-level timestamps using WAV2VEC2 alignment
50+ languages supported with automatic translation
Universal format support - Audio and video files up to 4GB

👥 Advanced Speaker Intelligence

Automatic speaker diarization using PyAnnote.audio
Cross-video speaker recognition with voice fingerprinting
AI-powered speaker suggestions using LLM context analysis
Global speaker profiles that persist across all recordings
Speaker analytics with talk time, pace, and interaction patterns

🤖 AI-Powered Insights

LLM integration - Support for OpenAI, Claude, vLLM, Ollama, OpenRouter, and custom providers
BLUF format summaries - Bottom Line Up Front structured analysis
Custom AI prompts - Unlimited prompts with flexible JSON schemas
Intelligent sectioning - Handles transcripts of any length automatically
Local or cloud processing - Privacy-first local models or powerful cloud AI

🔍 Powerful Search & Discovery

Hybrid search - Keyword + semantic search with OpenSearch 3.3.1
9.5x faster vector search - Significantly improved performance
25% faster queries with 75% lower p90 latency
Advanced filtering - Search by speaker, tags, collections, date, duration
Interactive navigation - Click-to-seek on transcripts and waveforms

⚡ Enterprise Performance

Multi-GPU scaling - Optional parallel processing (4+ workers per GPU)
Specialized work queues - GPU, CPU, Download, NLP, and Utility queues
Non-blocking architecture - Parallel processing saves 45-75s per 3-hour file
Model caching - Efficient ~2.6GB cache with automatic persistence
Complete offline support - Full airgapped deployment capability

Installation

Quick Install (Recommended)

curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash
cd opentranscribe
./opentranscribe.sh start

Access at: http://localhost:5173

Docker Hub Images

Pre-built multi-platform images (AMD64, ARM64):

davidamacey/opentranscribe-backend:v0.1.0
davidamacey/opentranscribe-frontend:v0.1.0

From Source

git clone https://github.com/davidamacey/OpenTranscribe.git
cd OpenTranscribe
git checkout v0.1.0
cp .env.example .env
# Edit .env with your settings
./opentr.sh start dev

What's Included

Core Features

✅ Transcription - WhisperX with faster-whisper backend
✅ Speaker Diarization - PyAnnote.audio integration with auto-labeling and profile generation
✅ Media File Upload - Direct upload of audio/video files up to 4GB with drag-and-drop
✅ Video File Size Detection - Client-side audio extraction option for large video files
✅ YouTube Support - Direct URL and playlist processing for batch transcription
✅ Browser Microphone Recording - Built-in recording (localhost or HTTPS) with background operation
✅ AI-Powered Summaries - Multi-provider LLM integration with customizable formats
✅ AI Topic Generation - Automatic tag and collection suggestions from transcript content
✅ Timestamp Comments - User annotations anchored to specific video moments
✅ Search Engine - OpenSearch 3.3.1 with hybrid keyword and vector search
✅ Collections - Organize media into themed groups with AI suggestions
✅ Analytics - Speaker metrics and interaction analysis
✅ Waveform Visualization - Interactive audio timeline
✅ PWA Support - Installable progressive web app
✅ Dark/Light Mode - Full theme support

Infrastructure

✅ Docker Compose - Multi-environment orchestration
✅ PostgreSQL - Relational database with JSONB
✅ MinIO - S3-compatible object storage
✅ Redis - Message broker and caching
✅ Celery - Distributed task processing
✅ NGINX - Production web server
✅ Flower - Task monitoring dashboard

Security

✅ Non-root containers - Principle of least privilege
✅ RBAC - Role-based access control
✅ Encrypted secrets - Secure API key storage
✅ Security scanning - Trivy and Grype integration
✅ Session management - JWT-based authentication

System Requirements

Minimum

CPU: 4 cores
RAM: 8GB
Storage: 50GB (including ~3GB for AI models)
GPU: Optional (CPU-only mode available)

Supported Platforms

OS: Linux, macOS (including Apple Silicon), Windows (via WSL2)
Architectures: AMD64, ARM64
GPUs: NVIDIA CUDA, Apple MPS (Metal)

Performance Benchmarks

Metric	Performance
Transcription Speed (GPU)	70x realtime
Vector Search Improvement	9.5x faster
Query Performance	25% faster, 75% lower p90 latency
Multi-GPU Throughput	4 videos simultaneously (4 workers)
Model Cache Size	~2.6GB total

Documentation

📚 Complete Documentation: https://docs.opentranscribe.app

Key resources:

Roadmap to v1.0.0

We're committed to delivering a stable, production-ready v1.0.0 release. While we'll strive for backwards compatibility, we cannot guarantee it until v1.0.0. Breaking changes will be clearly announced.

Planned features for future releases:

Real-time transcription for live streaming
Enhanced speaker analytics and visualization
Better speaker diarization models
Google-style text search
LLM powered RAG Chat with transcript text
Other refinements along the way!

Known Issues

No critical issues at release time. See GitHub Issues for community-reported items.

Contributing

We welcome contributions from the community! See our Contributing Guide for details.

Ways to contribute:

🐛 Report bugs and issues
💡 Suggest new features
🔧 Submit pull requests
📚 Improve documentation
🌍 Translate the interface
⭐ Star the repository

Support & Community

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: Contact via GitHub

Acknowledgments

OpenTranscribe builds upon amazing open-source projects:

OpenAI Whisper - Foundation speech recognition model
WhisperX - Enhanced alignment and diarization
PyAnnote.audio - Speaker diarization toolkit
FastAPI - Modern Python web framework
Svelte - Reactive frontend framework
PostgreSQL - Reliable database system
OpenSearch - Search and analytics engine
Docker - Containerization platform

Special thanks to the AI community and all contributors who helped make this release possible!

License

OpenTranscribe is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

See LICENSE for full details.

Built with ❤️ by the OpenTranscribe community

OpenTranscribe demonstrates the power of AI-assisted development while maintaining full local control over your data and processing.

Download: v0.1.0 Release
Docker: Backend | Frontend
Docs: docs.opentranscribe.app

Assets 2

Releases: davidamacey/OpenTranscribe

v0.3.3 - Community Contributions & Protected Media Support

Community Contributions & Protected Media Support

Highlights

How to Update

Protected Media Setup (Optional)

Full Changelog

Thank You

Uh oh!

v0.3.2 - Setup Script Bug Fixes

Fixed

Setup Script Fixes

Upgrade Notes

Uh oh!

v0.3.1 - Script Enhancements & Documentation Updates

Script Enhancements & Documentation Updates

Highlights

New Management Commands

NGINX Improvements

Documentation Updates

How to Update

Full Changelog

Uh oh!

OpenTranscribe v0.3.0 - Universal Media URL & NGINX Support

What's New in v0.3.0

🎬 Universal Media URL Support (1800+ Platforms)

🔐 NGINX Reverse Proxy with SSL/TLS

🔧 Critical Bug Fixes: UUID/ID Standardization

🏗️ Infrastructure Improvements

🌐 i18n Updates

🙏 Acknowledgments

How to Update

Docker Compose (Recommended)

For NGINX/SSL Setup

Full Changelog

Uh oh!

v0.2.1 - Security Patch Release

Security Patch Release

Resolved Critical CVEs (4 → 0)

Container Updates

How to Update

Full Changelog

Uh oh!

v0.2.0 - Community-Driven Multilingual Release

Growing Community

Community Contributions

Wes Brown's Seven Pull Requests

The Multilingual Feature Request

What's New in v0.2.0

🌍 Multilingual Transcription Support (100+ Languages)

🌐 UI Internationalization (7 Languages)

🎙️ Speaker Management Enhancements

🤖 LLM Integration Improvements

⚡ Performance & Stability

👤 Admin & User Experience

Upgrading to v0.2.0

Resources

Uh oh!

OpenTranscribe v0.1.0 - First Official Release

OpenTranscribe v0.1.0 - First Official Release

Overview

Why AGPL-3.0?

Key Highlights

🎧 Professional-Grade Transcription

👥 Advanced Speaker Intelligence

🤖 AI-Powered Insights

🔍 Powerful Search & Discovery

⚡ Enterprise Performance

Installation

Quick Install (Recommended)

Docker Hub Images

From Source

What's Included

Core Features

Infrastructure

Security

System Requirements

Minimum

Recommended

Supported Platforms