VoiceSync AI is a privacy-first transcription tool that runs 100% Offline. It uses OpenAI's Whisper model (via faster-whisper) locally on your device to convert speech to text, ensuring no audio data ever leaves your secure environment. Ideal for legal, medical, or confidential workflows.
Run the entire stack in 3 steps:
# 1. Start Storage (MinIO)
docker-compose up -d
# 2. Start Backend (Python/FastAPI)
cd ai-engine && pip install -r requirements.txt && uvicorn main:app --reload
# 3. Start Frontend (React)
cd web-client && npm install && npm run devDetailed Setup: See GETTING_STARTED.md for full environment config.
Secure Drag-and-Drop using Direct-to-S3 Pre-signed URLs
Accurate, timestamped text output running locally.
Deep Dive: See ARCHITECTURE.md for the System Design.
- 🔒 100% Offline: Runs entirely on localhost. No data sent to Cloud.
- ⚡ High Performance: Uses CTranslate2 (Faster-Whisper) for 4x faster CPU inference.
- 📂 Direct-to-Storage: Bypasses backend limits using Pre-Signed URLs for massive file support.
- 🛡️ Enterprise Ready: S3-compatible storage layer (MinIO) for scalability.
- Frontend requests permission (Token).
- Backend grants Pre-Signed URL.
- Frontend uploads Heavy Audio directly to Storage.
- Backend accesses Storage internally to process AI.
| Document | Description |
|---|---|
| System Architecture | Diagrams, Privacy Design, and Tech Choices. |
| Getting Started | Full installation and troubleshooting guide. |
| Failure Scenarios | How we handle crashes and offline modes. |
| Interview Q&A | "Why Pre-Signed URLs?" and other senior questions. |
| Domain | Technology | Use Case |
|---|---|---|
| Frontend | React (Vite) | Fast, modern SPA for file management. |
| Backend | Python (FastAPI) | Async orchestration and security. |
| AI Engine | Faster-Whisper | Optimized local inference. |
| Storage | MinIO | S3-Compatible Object Store. |
Harshan Aiyappa
Senior Full-Stack Hybrid Engineer
GitHub Profile
This project is licensed under the MIT License - see the LICENSE file for details.
