Releases · attevon-llc/OpenTranscribe

15 Apr 01:52

v0.4.1

a6e70cc

v0.4.1 - LDAP DN Fix & Keycloak PKI Compliance Latest

Latest

LDAP DN Fix & Keycloak PKI Compliance

This patch release fixes a critical LDAP group filtering bug reported in #188 and adds government/FedRAMP Keycloak-as-PKI-broker support.

What Was Broken

Active Directory Distinguished Names use commas as internal syntax (e.g. CN=Whisper_Users,CN=Users,DC=domain,DC=local). The previous code split group lists on commas, which shredded full DNs into fragments that could never match what AD returned. Group filtering was silently broken for any installation using full DNs.

Highlights

LDAP group DN parsing fixed — group lists now use semicolons as the multi-group separator; full AD DNs work correctly
PKI_ADMIN_DNS parsing fixed — same semicolon delimiter fix for certificate admin lists
Keycloak X.509 PKI broker — cert claims injected by Keycloak (both cert_* and x509_cert_* forms) are extracted and stored on the user record
PKI admin promotion via Keycloak — cert DN in PKI_ADMIN_DNS grants admin access for Keycloak users, matching standalone PKI auth behaviour
Government cert CN format — CN=LastName FirstName emailusername (space-separated 3-token) parsed and displayed as First Last
116 new unit tests across ldap_auth and keycloak_auth modules

Upgrade Notes

LDAP group list format change — update your LDAP_REQUIRED_USER_GROUPS and LDAP_ADMIN_GROUPS environment variables to use semicolons:

# Before (broken for full DNs)
LDAP_REQUIRED_USER_GROUPS=CN=Whisper_Users,CN=Users,DC=domain,DC=local

# After (correct)
LDAP_REQUIRED_USER_GROUPS=CN=Whisper_Users,CN=Users,DC=domain,DC=local

# Multiple groups — use semicolons
LDAP_REQUIRED_USER_GROUPS=CN=Group1,DC=domain,DC=local;CN=Group2,DC=domain,DC=local

PKI_ADMIN_DNS — if you have multiple admin DNs, use semicolons:

PKI_ADMIN_DNS=CN=Doe John jdoe,OU=Agency,O=U.S. Government,C=US;CN=Smith Jane jsmith,OU=Agency,O=U.S. Government,C=US

No database migrations required.

How to Update

Docker Compose:

docker compose pull
docker compose up -d

Full Changelog

See CHANGELOG.md

Assets 2

14 Apr 04:21

davidamacey

v0.4.0

f7c4829

v0.4.0 — Enterprise Auth, Native Pipeline, Neural Search & Security Hardening

A major release combining enterprise-grade authentication, a native transcription pipeline, neural search, GPU optimizations, cloud ASR providers, comprehensive speaker intelligence, a Progressive Web App, user groups & sharing, and a final frontend hardening sprint — all built from processing 1,400+ real-world recordings over two months of development. 281 commits since v0.3.3.

🔐 Enterprise Authentication

Four authentication methods that can run simultaneously, configured through the admin UI without restarts:

Local — Username/password with bcrypt, TOTP MFA (RFC 6238 — Google Authenticator, Authy, Microsoft Authenticator), FedRAMP IA-5 password policies (complexity, history, expiration), NIST AC-7 account lockout with progressive thresholds
LDAP/Active Directory — Enterprise directory integration with auto-provisioning and username-attribute mapping
OIDC/Keycloak — OpenID Connect with federated identity, social login, and federated logout propagation
PKI/X.509 — Certificate-based mTLS authentication with OCSP/CRL revocation checking and super-admin local password fallback

Plus: per-IP and per-user rate limiting, audit logging in structured JSON/CEF format with OpenSearch integration, JWT refresh token rotation with concurrent session limits, and database-driven configuration with AES-256-GCM encryption at rest — all manageable from a Super Admin UI without restarts.

⚡ Native Transcription Pipeline (2× Faster)

Replaced the legacy WhisperX pipeline with a native engine built on faster-whisper's BatchedInferencePipeline + PyAnnote v4. Cross-attention DTW provides word timestamps during transcription — no separate alignment pass, no wav2vec2 dependency, and native word timestamps for all 100+ languages (previously only ~42 via wav2vec2).

Benchmark (3.3-hour podcast, RTX A6000): 706s → 332s — 2.1× faster

Unified pipeline replaces the previous parallel_pipeline / whisperx_service split
User-configurable VAD — Voice Activity Detection threshold and silence duration exposed as tunable settings
Word timestamp validation — post-processing ensures monotonicity and prevents drift
GPU pipeline benchmarks — 40.3× single-file realtime, 54.6× peak at concurrency=8, perfect linear scaling 1–12 workers
TF32 acceleration enabled at worker startup and after diarization (Ampere+ GPUs)

🎙️ PyAnnote v4 Migration & Speaker Intelligence

Automatic migration system — Admin UI with real-time progress bar migrates speaker embeddings from v3 (512-dim) to v4 (256-dim) via atomic alias swap, zero downtime
Speaker overlap detection — Identifies overlapping speakers with confidence scoring
Speaker pre-clustering — GPU-accelerated cross-file speaker grouping (#144)
Global Speaker Management page — Dedicated page for cross-file speaker profile management with avatars
Gender classification — Apache 2.0 licensed neural network predicts gender from voice; stored on profiles for cross-video consistency
Gender-informed cluster validation — Cross-gender cluster assignments require higher similarity thresholds; minority members flagged for review
Speaker metadata parsing — Cross-reference pipeline with metadata hints display for LLM-assisted speaker identification (#141)
Jump-to-timestamp links in the speaker editor (#147)
Unassign & blacklist — Remove speaker assignments and blacklist erroneous profiles
Outlier analysis — Detect and flag outlier embeddings in speaker clusters
Inline audio playback — Play/pause toggle in speaker cluster views
OpenSearch cosine score fix — All 8 kNN score read locations now correctly convert (1+cos)/2 → raw cosine
Warm model caching eliminates 40-60s cold-start delays by pre-loading models on startup

🔍 Hybrid Neural Search

Full-text BM25 combined with semantic vector search via OpenSearch ML Commons. Search for "budget discussion" and find segments about "financial planning" even when those exact words never appear.

ML Commons integration — Native OpenSearch neural search, server-side embeddings
RRF hybrid merging — BM25 + vector scores combined via Reciprocal Rank Fusion
6 embedding model tiers — from 384-dim MiniLM (fast) to 768-dim mpnet (best quality)
Hybrid search crash fix — Previously silent fallback to BM25-only on OpenSearch 3.4 due to ArrayIndexOutOfBoundsException when combining aggs + hybrid + collapse + RRF
Soft demotion instead of hard suppression — Semantic results no longer dropped
Dynamic over-fetch — Cap raised from 200 to 1,000 via SEARCH_MAX_OVERFETCH for large indexes
BM25 tuning — Fuzziness AUTO, cross-fields, phrase slop, rank constant tuned 40→30
Stop/cancel reindex — Admin UI can cancel in-flight reindex operations
Offline/airgapped model downloading for air-gapped deployments
Dynamic model management via admin UI

☁️ Cloud ASR Providers

For deployments without a GPU — 8 cloud speech providers plus cloud diarization (#150):

Providers: Deepgram, AssemblyAI, OpenAI Whisper API, Google, AWS Transcribe, Azure Speech, Speechmatics, Gladia
pyannote.ai cloud diarization integration
Independent diarization provider architecture — diarization_source selector: ASR built-in, local PyAnnote GPU, pyannote.ai cloud, or off — independent of transcription provider choice
API-lite deployment mode — 2 GB CPU-only image vs. 8.9 GB for the full GPU image. Cloud-transcribed files still get local speaker embedding extraction for cross-file matching
Custom vocabulary — Domain-specific hotwords (medical, legal, corporate, government) used as faster-whisper hotwords and cloud provider keyword boosting
Admin-pinned ASR model — Admins control local Whisper model selection; model loaded once at startup, shared across all workers
Per-transcription model override — Users can override the admin-pinned model per upload (#153)

🤝 User Groups, Collection Sharing & Collaboration

User Groups & Collection Sharing (#148) — Create user groups and share collections with groups or individual users; granular viewer/editor permissions
Speaker profile sharing via the collection sharing infrastructure
Config/prompt sharing — Share LLM configs, prompts, media sources, and organization contexts between users
Per-collection AI prompts (#146) — Different summarization styles for different collection types
Bidirectional prompt-collection links — Prompts show which collections use them
Organization context (#142) — Inject domain knowledge into all LLM prompts for context-aware summaries

📤 Upload & Media

TUS 1.0.0 resumable uploads (#10) — Chunked uploads with MinIO multipart storage that survive network interruptions
Collection & tag selection at upload (#145) — Organize files during upload, not after
URL download quality settings (#122) — Configure video resolution, audio-only mode, and bitrate for yt-dlp downloads
File retention & auto-deletion (#134) — Admin-configurable file retention with automatic deletion and GDPR-compliant audit logging
Auto-labeling (#140) — AI suggests tags and collections from transcript content with fuzzy deduplication
Disable AI summary per upload (#152)
Disable speaker diarization per upload (#151)
Selective reprocessing (#143) — Stepper UI to re-run specific pipeline stages on existing files
YouTube bot-bypass — 2026 yt-dlp best practices (Deno JS runtime, client rotation, proper headers) for 1,800+ supported platforms

🛡️ Frontend Hardening Sprint

A dedicated audit sprint shipped in this release. Full details below under "Security", but the highlights:

Flash of Authenticated Content (FOAC) fix — Layout now gates protected content during async auth verification
Centralized user state cleanup (lib/session/clearUserState.ts) — 17+ stores, caches, and localStorage keys cleared on every login/logout
Session-scoped AbortController cancels in-flight requests on logout
bfcache invalidation — Back button after logout forces reload to discard restored snapshots
DOMPurify sanitization across 8 {@html} render sites; replaces a bypassable regex sanitizer
Production source maps disabled
Keycloak redirect URL validation

🎨 UX & Frontend Polish

Upload modal redesign — Replaced the 4,603-line monolith with a 6-step linear stepper (Media → Tags → Collections → Speakers → Options → Submit) plus a conditional Extract step for large videos. All three upload sources (file/URL/recording) now share steps 2-6. "Remember previous values" and "Review with defaults" shortcuts for power users
Skeleton loaders — Replace generic spinners on home gallery, search results, file detail, and speaker clusters/profiles/inbox (~20% faster perceived load per Nielsen Norman research)
Gallery click feedback — Instant press state + mousedown prefetch (~50-100ms head start)
Gallery redesign — Compact Apple-like grid cards, list view, sorting, multi-select bulk actions
Gallery state persistence — Filters persist across file detail navigation; scroll position restored on back
Collection & Share modal polish — Intro text, permission reference cards, empty states, backdrop-click data-loss protection
Manage Collections visual fix — Eliminated the "card in a card" glitch
Settings redesign — Tabbed navigation, per-user preferences, speaker behavior defaults
Queue Dashboard — Unified tasks view (formerly File Status) with quick filters, DatePicker, and pagination
Stepper reprocess UI — Step-by-step reprocessing with stage picker
Gallery action consolidation — Action buttons moved to header with dropdown groups (#139)
**Multi-select with auto-filter a...

Assets 2

14 Jan 05:13

davidamacey

v0.3.3

cae684a

v0.3.3 - Community Contributions & Protected Media Support

Community Contributions & Protected Media Support

Community-driven release featuring contributions from @vfilon, who submitted all four PRs in this version!

Highlights

🇷🇺 Russian Language Support - 8th supported UI language with 1,600+ translated strings
🔐 Protected Media Authentication - New plugin system for downloading from password-protected corporate video portals (MediaCMS support built-in)
🛠️ Bug Fixes - VRAM monitoring fix for non-CUDA devices, loading screen translation fix
🔧 URL Utilities - Centralized URL construction for consistent dev/production behavior

How to Update

Docker Compose:

docker compose pull
docker compose up -d

Protected Media Setup (Optional)

To enable authenticated downloads from MediaCMS installations:

# Add to .env
MEDIACMS_ALLOWED_HOSTS=media.example.com,mediacms.internal

Full Changelog

See CHANGELOG.md for complete details.

Thank You

Special thanks to @vfilon for contributing all four PRs in this release!

Assets 2

17 Dec 01:42

davidamacey

v0.3.2

23f39ce

v0.3.2 - Setup Script Bug Fixes

Patch release fixing critical bugs in the one-liner installation script that prevented successful setup on fresh installations.

Note: This is a scripts-only release. No Docker container rebuild required.

Fixed

Setup Script Fixes

Scripts Directory Creation - Fixed curl error 23 ("Failure writing output to destination") when downloading SSL and permission scripts by creating the scripts/ directory before download attempts
PyTorch 2.6+ Compatibility - Applied torch.load patch to download-models.py for PyTorch 2.6+ compatibility, mirroring the fix already present in the backend (from Wes Brown's commit 8929cd6)
- PyTorch 2.6 changed weights_only default to True, causing omegaconf deserialization errors during model downloads
- The patch sets weights_only=False for trusted HuggingFace models

Upgrade Notes

For existing installations: No action required - Docker containers already have the PyTorch fix.

For new installations: The one-liner setup script now works correctly:

curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash

Full Changelog: https://github.com/davidamacey/OpenTranscribe/blob/master/CHANGELOG.md

Assets 2

16 Dec 13:49

davidamacey

v0.3.1

e96fe52

v0.3.1 - Script Enhancements & Documentation Updates

Script Enhancements & Documentation Updates

Patch release with enhanced setup scripts for HTTPS/SSL configuration and comprehensive documentation updates covering v0.2.0 and v0.3.0 features.

Highlights

New Management Commands

./opentranscribe.sh setup-ssl - Interactive HTTPS/SSL configuration
./opentranscribe.sh version - Check current version and available updates
./opentranscribe.sh update - Update containers only (quick)
./opentranscribe.sh update-full - Update containers + config files (recommended)

NGINX Improvements

Automatic NGINX overlay loading when NGINX_SERVER_NAME is configured
NGINX health check added to ./opentr.sh health

Documentation Updates

New comprehensive NGINX/SSL setup guide
Updated docs for Universal Media URL support (1800+ platforms)
Added garbage cleanup feature documentation
FAQ entries for system statistics and transcript pagination
All Docusaurus and README docs updated for v0.2.0/v0.3.0 features

How to Update

Existing installations:

./opentranscribe.sh update-full

New installations:

curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash

Full Changelog

See CHANGELOG.md

Assets 2

15 Dec 12:24

davidamacey

v0.3.0

34b6a0f

OpenTranscribe v0.3.0 - Universal Media URL & NGINX Support

What's New in v0.3.0

This release integrates valuable contributions from the community fork by @vfilon, bringing major new features including support for 1800+ video platforms and production-ready NGINX reverse proxy with SSL/TLS.

🎬 Universal Media URL Support (1800+ Platforms)

The headline feature expands OpenTranscribe far beyond YouTube:

Supported Platforms:

Primary (Best Support): YouTube, Dailymotion, Twitter/X
Secondary: Vimeo (public only), TikTok (variable), and 1800+ more via yt-dlp

Features:

Dynamic source platform detection from yt-dlp metadata
User-friendly error messages for authentication-required platforms
Platform guidance for common issues (Vimeo login, Instagram restrictions, etc.)
Updated UI with "Supported Platforms" section and limitations warning

Note: Authentication is not currently supported. Videos requiring login will fail with helpful error messages guiding users to publicly accessible alternatives.

🔐 NGINX Reverse Proxy with SSL/TLS

This release closes #72, enabling browser microphone recording on remote network access:

docker-compose.nginx.yml overlay for production deployments
Full SSL/TLS configuration with HTTP → HTTPS redirect
WebSocket proxy support for real-time updates
2GB file upload support for large media files
Flower dashboard and MinIO console accessible through NGINX
Self-signed certificate generation script

🔧 Critical Bug Fixes: UUID/ID Standardization

Comprehensive fix for UUID/ID handling across 60+ files:

Issues Fixed:

Speaker recommendations not showing for new videos
Profile embedding service returning wrong ID type
Inconsistent ID handling between backend and frontend
Comment system UUID issues
Password reset flow problems

🏗️ Infrastructure Improvements

GPU Configuration:

Separated into optional docker-compose.gpu.yml overlay
Better cross-platform support (macOS, CPU-only systems)
Auto-detection in opentr.sh script

Task Management:

Task status reconciliation before marking files as stuck
Multiple timestamp fallbacks for better reliability
Auto-refresh analytics when segment speaker changes

LLM Service:

Ollama context window configuration (num_ctx parameter)
Model-aware temperature handling
Better logging with resolved endpoint info

🌐 i18n Updates

All 7 supported languages have been updated:

Notification text changed from "YouTube Processing" to "Video Processing"
New media URL description and platform limitation strings
Updated recommended platforms list

🙏 Acknowledgments

Special thanks to @vfilon for the fork contributions that made this release possible:

Universal Media URL support concept
NGINX reverse proxy configuration
Task status reconciliation improvements
GPU overlay separation

How to Update

Docker Compose (Recommended)

# Pull the latest images
docker compose pull

# Restart with new images
docker compose up -d

For NGINX/SSL Setup

# Set NGINX_SERVER_NAME in .env
./scripts/generate-ssl-cert.sh
./opentr.sh start prod

See docs/NGINX_SETUP.md for complete setup instructions.

Full Changelog

See the CHANGELOG for complete details.

Full Changelog: v0.2.1...v0.3.0

Assets 2

13 Dec 14:55

davidamacey

v0.2.1

6172cda

v0.2.1 - Security Patch Release

Security Patch Release

This release addresses critical container vulnerabilities identified in security scans. All users are encouraged to update.

Resolved Critical CVEs (4 → 0)

CVE	Package	Severity	Status
CVE-2025-47917	libmbedcrypto	CRITICAL	✅ Fixed
CVE-2023-6879	libaom3	CRITICAL	✅ Fixed
CVE-2025-7458	libsqlite3	CRITICAL	✅ Fixed
CVE-2023-45853	zlib	CRITICAL	✅ Fixed

Container Updates

Frontend:

nginx:1.29.3-alpine3.22 → nginx:1.29.4-alpine3.23
Fixed 6 vulnerabilities (3 HIGH, 3 MEDIUM) in libpng and busybox
Added HEALTHCHECK instruction

Backend:

python:3.12-slim-bookworm → python:3.13-slim-trixie
Debian 12 → Debian 13 "trixie"
Python 3.12 → Python 3.13
Added HEALTHCHECK instruction

How to Update

Docker Compose:

docker compose pull
docker compose up -d

Manual:

docker pull davidamacey/opentranscribe-frontend:v0.2.1
docker pull davidamacey/opentranscribe-backend:v0.2.1

Full Changelog

See CHANGELOG.md for complete details.

🔒 Your security is our priority. Thank you for using OpenTranscribe.

Assets 2

13 Dec 00:43

davidamacey

v0.2.0

8851626

v0.2.0 - Community-Driven Multilingual Release

We're thrilled to announce OpenTranscribe v0.2.0! This release is special because it marks our first major community-driven update, featuring contributions from real-world users who are actively using OpenTranscribe in production.

Growing Community

In just over a month since our v0.1.0 release, OpenTranscribe has seen exciting growth:

8 GitHub Stars - Thank you for the support!
7 Pull Requests from community contributor @SQLServerIO (Wes Brown)
Critical feature request from @LaboratorioInternacionalWeb that shaped this release

Community Contributions

Wes Brown's Seven Pull Requests

A massive thank you to Wes Brown (@SQLServerIO) who submitted an incredible seven pull requests addressing real-world issues he encountered while using OpenTranscribe:

PR #110: Pagination for large transcripts - Fixes page hanging with thousands of segments
PR #107: Auto-cleanup garbage transcription segments
PR #106: User admin endpoints now use UUID instead of integer ID
PR #105: Speaker merge UI and segment speaker reassignment
PR #104: LLM model discovery for OpenAI-compatible providers
PR #103: Per-file speaker count settings in upload and reprocess UI
PR #102: PyTorch 2.6+ compatibility and speaker diarization settings

The Multilingual Feature Request

Issue #99 from @LaboratorioInternacionalWeb highlighted a critical gap in our product: Spanish audio files were being transcribed to English because WhisperX was hardcoded with language="en" and task="translate".

What's New in v0.2.0

🌍 Multilingual Transcription Support (100+ Languages)

Source Language: Auto-detect or specify the audio language (100+ languages supported)
Translate to English: Toggle to translate non-English audio (default: OFF - keeps original language)
LLM Output Language: Generate AI summaries in 12 different languages
~42 languages have word-level timestamp support via wav2vec2 alignment
Settings are stored per-user in the database

🌐 UI Internationalization (7 Languages)

The UI is now available in:

English (default)
Spanish (Español)
French (Français)
German (Deutsch)
Portuguese (Português)
Chinese (中文)
Japanese (日本語)

🎙️ Speaker Management Enhancements

Speaker Merge UI: New visual interface to combine duplicate speakers with segment preview and reassignment
Per-File Speaker Settings: Configure min/max speakers at upload or reprocess time
User-Level Preferences: Save default speaker detection settings

🤖 LLM Integration Improvements

Model Auto-Discovery: Automatic detection of available models for vLLM, Ollama, and Anthropic providers
Anthropic Support Enhanced: Native model discovery via /v1/models API
Multilingual Output: Generate AI summaries in 12 different languages
Improved Configuration UX: Toast notifications, better API key handling, edit mode with stored keys
Updated Default Models: Anthropic uses claude-opus-4-5-20251101, Ollama uses llama3.2:latest

⚡ Performance & Stability

Pagination for Large Transcripts: No more browser hanging with thousands of segments
Auto-Cleanup Garbage Segments: Automatic detection and removal of erroneous transcription segments
PyTorch 2.6+ Compatibility: Support for the latest PyTorch versions
Backend Code Quality: Reduced cyclomatic complexity across 47 functions in 27 files

👤 Admin & User Experience

System Statistics: CPU, memory, disk, and GPU usage now visible to all users
Admin Password Reset: Secure password reset functionality with validation
UUID Consistency: Fixed admin endpoints to use UUID instead of integer IDs

Upgrading to v0.2.0

# If using the production installer
cd opentranscribe
./opentranscribe.sh update

# Or pull the latest Docker images
docker compose pull
docker compose up -d

Database migrations run automatically on startup - no manual intervention required.

Resources

Documentation: docs.opentranscribe.app
GitHub: github.com/davidamacey/OpenTranscribe
Docker Hub: Backend | Frontend
Blog Post: Full Release Notes

Full Changelog: v0.1.0...v0.2.0

Happy transcribing! 🎉
— The OpenTranscribe Team

Assets 2

06 Nov 05:05

davidamacey

v0.1.0

1cbb8e5

OpenTranscribe v0.1.0 - First Official Release

Release Date: November 5, 2025
License: GNU Affero General Public License v3.0 (AGPL-3.0)

Overview

We're thrilled to announce the first official release of OpenTranscribe! After 6 months of intensive development starting in May 2025, what began as a weekend experiment has evolved into a production-ready, fully-featured AI transcription platform.

OpenTranscribe is a powerful, self-hosted AI-powered transcription and media analysis platform that combines state-of-the-art AI models with a modern web interface to provide high-accuracy transcription, speaker identification, AI summarization, and advanced search capabilities.

Why AGPL-3.0?

We've chosen the GNU Affero General Public License v3.0 to:

Protect open source - Ensure the code remains open and accessible to everyone
Prevent proprietary forks - Require that modifications, especially network services, remain open
Ensure transparency - Network users have the right to access the source code
Build community - Foster collaboration and shared improvements

Key Highlights

🎧 Professional-Grade Transcription

70x realtime speed on GPU with large-v2 model
Word-level timestamps using WAV2VEC2 alignment
50+ languages supported with automatic translation
Universal format support - Audio and video files up to 4GB

👥 Advanced Speaker Intelligence

Automatic speaker diarization using PyAnnote.audio
Cross-video speaker recognition with voice fingerprinting
AI-powered speaker suggestions using LLM context analysis
Global speaker profiles that persist across all recordings
Speaker analytics with talk time, pace, and interaction patterns

🤖 AI-Powered Insights

LLM integration - Support for OpenAI, Claude, vLLM, Ollama, OpenRouter, and custom providers
BLUF format summaries - Bottom Line Up Front structured analysis
Custom AI prompts - Unlimited prompts with flexible JSON schemas
Intelligent sectioning - Handles transcripts of any length automatically
Local or cloud processing - Privacy-first local models or powerful cloud AI

🔍 Powerful Search & Discovery

Hybrid search - Keyword + semantic search with OpenSearch 3.3.1
9.5x faster vector search - Significantly improved performance
25% faster queries with 75% lower p90 latency
Advanced filtering - Search by speaker, tags, collections, date, duration
Interactive navigation - Click-to-seek on transcripts and waveforms

⚡ Enterprise Performance

Multi-GPU scaling - Optional parallel processing (4+ workers per GPU)
Specialized work queues - GPU, CPU, Download, NLP, and Utility queues
Non-blocking architecture - Parallel processing saves 45-75s per 3-hour file
Model caching - Efficient ~2.6GB cache with automatic persistence
Complete offline support - Full airgapped deployment capability

Installation

Quick Install (Recommended)

curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash
cd opentranscribe
./opentranscribe.sh start

Access at: http://localhost:5173

Docker Hub Images

Pre-built multi-platform images (AMD64, ARM64):

davidamacey/opentranscribe-backend:v0.1.0
davidamacey/opentranscribe-frontend:v0.1.0

From Source

git clone https://github.com/davidamacey/OpenTranscribe.git
cd OpenTranscribe
git checkout v0.1.0
cp .env.example .env
# Edit .env with your settings
./opentr.sh start dev

What's Included

Core Features

✅ Transcription - WhisperX with faster-whisper backend
✅ Speaker Diarization - PyAnnote.audio integration with auto-labeling and profile generation
✅ Media File Upload - Direct upload of audio/video files up to 4GB with drag-and-drop
✅ Video File Size Detection - Client-side audio extraction option for large video files
✅ YouTube Support - Direct URL and playlist processing for batch transcription
✅ Browser Microphone Recording - Built-in recording (localhost or HTTPS) with background operation
✅ AI-Powered Summaries - Multi-provider LLM integration with customizable formats
✅ AI Topic Generation - Automatic tag and collection suggestions from transcript content
✅ Timestamp Comments - User annotations anchored to specific video moments
✅ Search Engine - OpenSearch 3.3.1 with hybrid keyword and vector search
✅ Collections - Organize media into themed groups with AI suggestions
✅ Analytics - Speaker metrics and interaction analysis
✅ Waveform Visualization - Interactive audio timeline
✅ PWA Support - Installable progressive web app
✅ Dark/Light Mode - Full theme support

Infrastructure

✅ Docker Compose - Multi-environment orchestration
✅ PostgreSQL - Relational database with JSONB
✅ MinIO - S3-compatible object storage
✅ Redis - Message broker and caching
✅ Celery - Distributed task processing
✅ NGINX - Production web server
✅ Flower - Task monitoring dashboard

Security

✅ Non-root containers - Principle of least privilege
✅ RBAC - Role-based access control
✅ Encrypted secrets - Secure API key storage
✅ Security scanning - Trivy and Grype integration
✅ Session management - JWT-based authentication

System Requirements

Minimum

CPU: 4 cores
RAM: 8GB
Storage: 50GB (including ~3GB for AI models)
GPU: Optional (CPU-only mode available)

Supported Platforms

OS: Linux, macOS (including Apple Silicon), Windows (via WSL2)
Architectures: AMD64, ARM64
GPUs: NVIDIA CUDA, Apple MPS (Metal)

Performance Benchmarks

Metric	Performance
Transcription Speed (GPU)	70x realtime
Vector Search Improvement	9.5x faster
Query Performance	25% faster, 75% lower p90 latency
Multi-GPU Throughput	4 videos simultaneously (4 workers)
Model Cache Size	~2.6GB total

Documentation

📚 Complete Documentation: https://docs.opentranscribe.app

Key resources:

Roadmap to v1.0.0

We're committed to delivering a stable, production-ready v1.0.0 release. While we'll strive for backwards compatibility, we cannot guarantee it until v1.0.0. Breaking changes will be clearly announced.

Planned features for future releases:

Real-time transcription for live streaming
Enhanced speaker analytics and visualization
Better speaker diarization models
Google-style text search
LLM powered RAG Chat with transcript text
Other refinements along the way!

Known Issues

No critical issues at release time. See GitHub Issues for community-reported items.

Contributing

We welcome contributions from the community! See our Contributing Guide for details.

Ways to contribute:

🐛 Report bugs and issues
💡 Suggest new features
🔧 Submit pull requests
📚 Improve documentation
🌍 Translate the interface
⭐ Star the repository

Support & Community

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: Contact via GitHub

Acknowledgments

OpenTranscribe builds upon amazing open-source projects:

OpenAI Whisper - Foundation speech recognition model
WhisperX - Enhanced alignment and diarization
PyAnnote.audio - Speaker diarization toolkit
FastAPI - Modern Python web framework
Svelte - Reactive frontend framework
PostgreSQL - Reliable database system
OpenSearch - Search and analytics engine
Docker - Containerization platform

Special thanks to the AI community and all contributors who helped make this release possible!

License

OpenTranscribe is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

See LICENSE for full details.

Built with ❤️ by the OpenTranscribe community

OpenTranscribe demonstrates the power of AI-assisted development while maintaining full local control over your data and processing.

Download: v0.1.0 Release
Docker: Backend | Frontend
Docs: docs.opentranscribe.app

Assets 2

Releases: attevon-llc/OpenTranscribe

v0.4.1 - LDAP DN Fix & Keycloak PKI Compliance

LDAP DN Fix & Keycloak PKI Compliance

What Was Broken

Highlights

Upgrade Notes

How to Update

Full Changelog

Uh oh!

v0.4.0 — Enterprise Auth, Native Pipeline, Neural Search & Security Hardening

v0.4.0 — Enterprise Auth, Native Pipeline, Neural Search & Security Hardening

🔐 Enterprise Authentication

⚡ Native Transcription Pipeline (2× Faster)

🎙️ PyAnnote v4 Migration & Speaker Intelligence

🔍 Hybrid Neural Search

☁️ Cloud ASR Providers

🤝 User Groups, Collection Sharing & Collaboration

📤 Upload & Media

🛡️ Frontend Hardening Sprint

🎨 UX & Frontend Polish

Uh oh!

v0.3.3 - Community Contributions & Protected Media Support

Community Contributions & Protected Media Support

Highlights

How to Update

Protected Media Setup (Optional)

Full Changelog

Thank You

Uh oh!

v0.3.2 - Setup Script Bug Fixes

Fixed

Setup Script Fixes

Upgrade Notes

Uh oh!

v0.3.1 - Script Enhancements & Documentation Updates

Script Enhancements & Documentation Updates

Highlights

New Management Commands

NGINX Improvements

Documentation Updates

How to Update

Full Changelog

Uh oh!

OpenTranscribe v0.3.0 - Universal Media URL & NGINX Support

What's New in v0.3.0

🎬 Universal Media URL Support (1800+ Platforms)

🔐 NGINX Reverse Proxy with SSL/TLS

🔧 Critical Bug Fixes: UUID/ID Standardization

🏗️ Infrastructure Improvements

🌐 i18n Updates

🙏 Acknowledgments

How to Update

Docker Compose (Recommended)

For NGINX/SSL Setup

Full Changelog

Uh oh!

v0.2.1 - Security Patch Release

Security Patch Release

Resolved Critical CVEs (4 → 0)

Container Updates

How to Update

Full Changelog

Uh oh!

v0.2.0 - Community-Driven Multilingual Release

Growing Community

Community Contributions

Wes Brown's Seven Pull Requests

The Multilingual Feature Request

What's New in v0.2.0

🌍 Multilingual Transcription Support (100+ Languages)

🌐 UI Internationalization (7 Languages)

🎙️ Speaker Management Enhancements

🤖 LLM Integration Improvements

⚡ Performance & Stability

👤 Admin & User Experience

Upgrading to v0.2.0

Resources

Uh oh!

OpenTranscribe v0.1.0 - First Official Release

OpenTranscribe v0.1.0 - First Official Release