Skip to content

v0.2.0 - Community-Driven Multilingual Release

Choose a tag to compare

@davidamacey davidamacey released this 13 Dec 00:43
· 342 commits to master since this release
8851626

We're thrilled to announce OpenTranscribe v0.2.0! This release is special because it marks our first major community-driven update, featuring contributions from real-world users who are actively using OpenTranscribe in production.

Growing Community

In just over a month since our v0.1.0 release, OpenTranscribe has seen exciting growth:

Community Contributions

Wes Brown's Seven Pull Requests

A massive thank you to Wes Brown (@SQLServerIO) who submitted an incredible seven pull requests addressing real-world issues he encountered while using OpenTranscribe:

  1. PR #110: Pagination for large transcripts - Fixes page hanging with thousands of segments
  2. PR #107: Auto-cleanup garbage transcription segments
  3. PR #106: User admin endpoints now use UUID instead of integer ID
  4. PR #105: Speaker merge UI and segment speaker reassignment
  5. PR #104: LLM model discovery for OpenAI-compatible providers
  6. PR #103: Per-file speaker count settings in upload and reprocess UI
  7. PR #102: PyTorch 2.6+ compatibility and speaker diarization settings

The Multilingual Feature Request

Issue #99 from @LaboratorioInternacionalWeb highlighted a critical gap in our product: Spanish audio files were being transcribed to English because WhisperX was hardcoded with language="en" and task="translate".

What's New in v0.2.0

🌍 Multilingual Transcription Support (100+ Languages)

  • Source Language: Auto-detect or specify the audio language (100+ languages supported)
  • Translate to English: Toggle to translate non-English audio (default: OFF - keeps original language)
  • LLM Output Language: Generate AI summaries in 12 different languages
  • ~42 languages have word-level timestamp support via wav2vec2 alignment
  • Settings are stored per-user in the database

🌐 UI Internationalization (7 Languages)

The UI is now available in:

  • English (default)
  • Spanish (Español)
  • French (Français)
  • German (Deutsch)
  • Portuguese (Português)
  • Chinese (中文)
  • Japanese (日本語)

🎙️ Speaker Management Enhancements

  • Speaker Merge UI: New visual interface to combine duplicate speakers with segment preview and reassignment
  • Per-File Speaker Settings: Configure min/max speakers at upload or reprocess time
  • User-Level Preferences: Save default speaker detection settings

🤖 LLM Integration Improvements

  • Model Auto-Discovery: Automatic detection of available models for vLLM, Ollama, and Anthropic providers
  • Anthropic Support Enhanced: Native model discovery via /v1/models API
  • Multilingual Output: Generate AI summaries in 12 different languages
  • Improved Configuration UX: Toast notifications, better API key handling, edit mode with stored keys
  • Updated Default Models: Anthropic uses claude-opus-4-5-20251101, Ollama uses llama3.2:latest

⚡ Performance & Stability

  • Pagination for Large Transcripts: No more browser hanging with thousands of segments
  • Auto-Cleanup Garbage Segments: Automatic detection and removal of erroneous transcription segments
  • PyTorch 2.6+ Compatibility: Support for the latest PyTorch versions
  • Backend Code Quality: Reduced cyclomatic complexity across 47 functions in 27 files

👤 Admin & User Experience

  • System Statistics: CPU, memory, disk, and GPU usage now visible to all users
  • Admin Password Reset: Secure password reset functionality with validation
  • UUID Consistency: Fixed admin endpoints to use UUID instead of integer IDs

Upgrading to v0.2.0

# If using the production installer
cd opentranscribe
./opentranscribe.sh update

# Or pull the latest Docker images
docker compose pull
docker compose up -d

Database migrations run automatically on startup - no manual intervention required.

Resources


Full Changelog: v0.1.0...v0.2.0

Happy transcribing! 🎉
The OpenTranscribe Team