Skip to content

Commit 1cbb8e5

Browse files
davidamaceyclaude
andcommitted
release: OpenTranscribe v0.1.0 - First Official Release
This is the first official release of OpenTranscribe, marking the transition from internal development to public availability. MAJOR CHANGES: - License changed from MIT to GNU AGPL-3.0 for open source protection - Version numbering starts at 0.1.0 on path to v1.0.0 - Comprehensive documentation and release materials added LICENSE UPDATE: - Migrated to GNU Affero General Public License v3.0 (AGPL-3.0) - Updated all license references across codebase and documentation - Added AGPL-3.0 protections for network copyleft - Ensures modifications to network services remain open source VERSION SYNCHRONIZATION: - Set version to 0.1.0 in VERSION file - Updated pyproject.toml from 2.0.0 to 0.1.0 - Updated frontend/package.json from 0.2.0 to 0.1.0 - All version references now consistent across project DOCUMENTATION: - Added comprehensive CHANGELOG.md with complete feature list - Created RELEASE_NOTES_v0.1.0.md for GitHub release - Published blog post announcing v0.1.0 release - Updated all documentation site references KEY FEATURES HIGHLIGHTED: ✅ Media file upload with drag-and-drop (up to 4GB) ✅ Video file size detection with client-side audio extraction ✅ AI-powered summaries with multi-provider LLM support ✅ AI-powered topic generation for tags and collections ✅ YouTube video and playlist URL processing ✅ Browser microphone recording (localhost/HTTPS) with background operation ✅ Auto speaker labeling and profile generation ✅ Timestamp-based user comments on videos ✅ Cross-video speaker recognition with voice fingerprinting ✅ 70x realtime transcription speed on GPU ✅ OpenSearch 3.3.1 with 9.5x faster vector search ✅ Multi-GPU worker scaling for high-throughput systems ✅ Complete offline/airgapped deployment support ROADMAP TO v1.0.0: - Backwards compatibility efforts (no guarantees until v1.0.0) - Real-time transcription for live streaming - Enhanced speaker analytics - Better speaker diarization models - Google-style text search - LLM powered RAG Chat with transcripts 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent f15421d commit 1cbb8e5

13 files changed

Lines changed: 1470 additions & 57 deletions

File tree

CHANGELOG.md

Lines changed: 274 additions & 0 deletions
Large diffs are not rendered by default.

LICENSE

Lines changed: 661 additions & 21 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -935,7 +935,13 @@ WHISPER_MODEL=base # Faster processing
935935

936936
## 📄 License
937937

938-
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
938+
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) - see the [LICENSE](LICENSE) file for details.
939+
940+
The AGPL-3.0 license ensures that:
941+
- The source code remains open and accessible to everyone
942+
- Any modifications to the software must be made available to users
943+
- Network use (SaaS) requires source code availability
944+
- Protects the open source community and prevents proprietary forks
939945

940946
## 🙏 Acknowledgments
941947

RELEASE_NOTES_v0.1.0.md

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
# OpenTranscribe v0.1.0 - First Official Release
2+
3+
**Release Date:** November 5, 2025
4+
**License:** GNU Affero General Public License v3.0 (AGPL-3.0)
5+
6+
## Overview
7+
8+
We're thrilled to announce the first official release of OpenTranscribe! After 6 months of intensive development starting in May 2025, what began as a weekend experiment has evolved into a production-ready, fully-featured AI transcription platform.
9+
10+
OpenTranscribe is a powerful, self-hosted AI-powered transcription and media analysis platform that combines state-of-the-art AI models with a modern web interface to provide high-accuracy transcription, speaker identification, AI summarization, and advanced search capabilities.
11+
12+
## Why AGPL-3.0?
13+
14+
We've chosen the GNU Affero General Public License v3.0 to:
15+
- **Protect open source** - Ensure the code remains open and accessible to everyone
16+
- **Prevent proprietary forks** - Require that modifications, especially network services, remain open
17+
- **Ensure transparency** - Network users have the right to access the source code
18+
- **Build community** - Foster collaboration and shared improvements
19+
20+
## Key Highlights
21+
22+
### 🎧 Professional-Grade Transcription
23+
- **70x realtime speed** on GPU with large-v2 model
24+
- **Word-level timestamps** using WAV2VEC2 alignment
25+
- **50+ languages** supported with automatic translation
26+
- **Universal format support** - Audio and video files up to 4GB
27+
28+
### 👥 Advanced Speaker Intelligence
29+
- **Automatic speaker diarization** using PyAnnote.audio
30+
- **Cross-video speaker recognition** with voice fingerprinting
31+
- **AI-powered speaker suggestions** using LLM context analysis
32+
- **Global speaker profiles** that persist across all recordings
33+
- **Speaker analytics** with talk time, pace, and interaction patterns
34+
35+
### 🤖 AI-Powered Insights
36+
- **LLM integration** - Support for OpenAI, Claude, vLLM, Ollama, OpenRouter, and custom providers
37+
- **BLUF format summaries** - Bottom Line Up Front structured analysis
38+
- **Custom AI prompts** - Unlimited prompts with flexible JSON schemas
39+
- **Intelligent sectioning** - Handles transcripts of any length automatically
40+
- **Local or cloud processing** - Privacy-first local models or powerful cloud AI
41+
42+
### 🔍 Powerful Search & Discovery
43+
- **Hybrid search** - Keyword + semantic search with OpenSearch 3.3.1
44+
- **9.5x faster vector search** - Significantly improved performance
45+
- **25% faster queries** with 75% lower p90 latency
46+
- **Advanced filtering** - Search by speaker, tags, collections, date, duration
47+
- **Interactive navigation** - Click-to-seek on transcripts and waveforms
48+
49+
### ⚡ Enterprise Performance
50+
- **Multi-GPU scaling** - Optional parallel processing (4+ workers per GPU)
51+
- **Specialized work queues** - GPU, CPU, Download, NLP, and Utility queues
52+
- **Non-blocking architecture** - Parallel processing saves 45-75s per 3-hour file
53+
- **Model caching** - Efficient ~2.6GB cache with automatic persistence
54+
- **Complete offline support** - Full airgapped deployment capability
55+
56+
## Installation
57+
58+
### Quick Install (Recommended)
59+
```bash
60+
curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash
61+
cd opentranscribe
62+
./opentranscribe.sh start
63+
```
64+
65+
Access at: **http://localhost:5173**
66+
67+
### Docker Hub Images
68+
Pre-built multi-platform images (AMD64, ARM64):
69+
- `davidamacey/opentranscribe-backend:v0.1.0`
70+
- `davidamacey/opentranscribe-frontend:v0.1.0`
71+
72+
### From Source
73+
```bash
74+
git clone https://github.com/davidamacey/OpenTranscribe.git
75+
cd OpenTranscribe
76+
git checkout v0.1.0
77+
cp .env.example .env
78+
# Edit .env with your settings
79+
./opentr.sh start dev
80+
```
81+
82+
## What's Included
83+
84+
### Core Features
85+
**Transcription** - WhisperX with faster-whisper backend
86+
**Speaker Diarization** - PyAnnote.audio integration with auto-labeling and profile generation
87+
**Media File Upload** - Direct upload of audio/video files up to 4GB with drag-and-drop
88+
**Video File Size Detection** - Client-side audio extraction option for large video files
89+
**YouTube Support** - Direct URL and playlist processing for batch transcription
90+
**Browser Microphone Recording** - Built-in recording (localhost or HTTPS) with background operation
91+
**AI-Powered Summaries** - Multi-provider LLM integration with customizable formats
92+
**AI Topic Generation** - Automatic tag and collection suggestions from transcript content
93+
**Timestamp Comments** - User annotations anchored to specific video moments
94+
**Search Engine** - OpenSearch 3.3.1 with hybrid keyword and vector search
95+
**Collections** - Organize media into themed groups with AI suggestions
96+
**Analytics** - Speaker metrics and interaction analysis
97+
**Waveform Visualization** - Interactive audio timeline
98+
**PWA Support** - Installable progressive web app
99+
**Dark/Light Mode** - Full theme support
100+
101+
### Infrastructure
102+
**Docker Compose** - Multi-environment orchestration
103+
**PostgreSQL** - Relational database with JSONB
104+
**MinIO** - S3-compatible object storage
105+
**Redis** - Message broker and caching
106+
**Celery** - Distributed task processing
107+
**NGINX** - Production web server
108+
**Flower** - Task monitoring dashboard
109+
110+
### Security
111+
**Non-root containers** - Principle of least privilege
112+
**RBAC** - Role-based access control
113+
**Encrypted secrets** - Secure API key storage
114+
**Security scanning** - Trivy and Grype integration
115+
**Session management** - JWT-based authentication
116+
117+
## System Requirements
118+
119+
### Minimum
120+
- **CPU:** 4 cores
121+
- **RAM:** 8GB
122+
- **Storage:** 50GB (including ~3GB for AI models)
123+
- **GPU:** Optional (CPU-only mode available)
124+
125+
### Recommended
126+
- **CPU:** 8+ cores
127+
- **RAM:** 16GB+
128+
- **Storage:** 100GB+ SSD
129+
- **GPU:** NVIDIA GPU with 8GB+ VRAM (RTX 3070 or better)
130+
131+
### Supported Platforms
132+
- **OS:** Linux, macOS (including Apple Silicon), Windows (via WSL2)
133+
- **Architectures:** AMD64, ARM64
134+
- **GPUs:** NVIDIA CUDA, Apple MPS (Metal)
135+
136+
## Performance Benchmarks
137+
138+
| Metric | Performance |
139+
|--------|-------------|
140+
| Transcription Speed (GPU) | 70x realtime |
141+
| Vector Search Improvement | 9.5x faster |
142+
| Query Performance | 25% faster, 75% lower p90 latency |
143+
| Multi-GPU Throughput | 4 videos simultaneously (4 workers) |
144+
| Model Cache Size | ~2.6GB total |
145+
146+
## Documentation
147+
148+
📚 **Complete Documentation:** https://docs.opentranscribe.app
149+
150+
Key resources:
151+
- [Quick Start Guide](https://docs.opentranscribe.app/docs/getting-started/quick-start)
152+
- [Installation Guide](https://docs.opentranscribe.app/docs/getting-started/installation)
153+
- [User Guide](https://docs.opentranscribe.app/docs/user-guide)
154+
- [Configuration Reference](https://docs.opentranscribe.app/docs/configuration)
155+
- [Screenshots & Visual Guide](https://docs.opentranscribe.app/docs/screenshots)
156+
- [FAQ](https://docs.opentranscribe.app/docs/faq)
157+
- [Troubleshooting](https://docs.opentranscribe.app/docs/troubleshooting)
158+
159+
## Roadmap to v1.0.0
160+
161+
We're committed to delivering a stable, production-ready v1.0.0 release. While we'll strive for backwards compatibility, we cannot guarantee it until v1.0.0. Breaking changes will be clearly announced.
162+
163+
**Planned features for future releases:**
164+
- Real-time transcription for live streaming
165+
- Enhanced speaker analytics and visualization
166+
- Better speaker diarization models
167+
- Google-style text search
168+
- LLM powered RAG Chat with transcript text
169+
- Other refinements along the way!
170+
171+
## Known Issues
172+
173+
No critical issues at release time. See [GitHub Issues](https://github.com/davidamacey/OpenTranscribe/issues) for community-reported items.
174+
175+
## Contributing
176+
177+
We welcome contributions from the community! See our [Contributing Guide](https://github.com/davidamacey/OpenTranscribe/blob/master/docs/CONTRIBUTING.md) for details.
178+
179+
Ways to contribute:
180+
- 🐛 Report bugs and issues
181+
- 💡 Suggest new features
182+
- 🔧 Submit pull requests
183+
- 📚 Improve documentation
184+
- 🌍 Translate the interface
185+
- ⭐ Star the repository
186+
187+
## Support & Community
188+
189+
- **Issues:** [GitHub Issues](https://github.com/davidamacey/OpenTranscribe/issues)
190+
- **Discussions:** [GitHub Discussions](https://github.com/davidamacey/OpenTranscribe/discussions)
191+
- **Email:** [Contact via GitHub](https://github.com/davidamacey)
192+
193+
## Acknowledgments
194+
195+
OpenTranscribe builds upon amazing open-source projects:
196+
- **OpenAI Whisper** - Foundation speech recognition model
197+
- **WhisperX** - Enhanced alignment and diarization
198+
- **PyAnnote.audio** - Speaker diarization toolkit
199+
- **FastAPI** - Modern Python web framework
200+
- **Svelte** - Reactive frontend framework
201+
- **PostgreSQL** - Reliable database system
202+
- **OpenSearch** - Search and analytics engine
203+
- **Docker** - Containerization platform
204+
205+
Special thanks to the AI community and all contributors who helped make this release possible!
206+
207+
## License
208+
209+
OpenTranscribe is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
210+
211+
See [LICENSE](https://github.com/davidamacey/OpenTranscribe/blob/master/LICENSE) for full details.
212+
213+
---
214+
215+
**Built with ❤️ by the OpenTranscribe community**
216+
217+
*OpenTranscribe demonstrates the power of AI-assisted development while maintaining full local control over your data and processing.*
218+
219+
**Download:** [v0.1.0 Release](https://github.com/davidamacey/OpenTranscribe/releases/tag/v0.1.0)
220+
**Docker:** [Backend](https://hub.docker.com/r/davidamacey/opentranscribe-backend) | [Frontend](https://hub.docker.com/r/davidamacey/opentranscribe-frontend)
221+
**Docs:** [docs.opentranscribe.app](https://docs.opentranscribe.app)

VERSION

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
v0.1.0

0 commit comments

Comments
 (0)