VoiceSync AI

Secure Offline Audio Transcription Platform

VoiceSync AI is a privacy-first transcription tool that runs 100% Offline. It uses OpenAI's Whisper model (via faster-whisper) locally on your device to convert speech to text, ensuring no audio data ever leaves your secure environment. Ideal for legal, medical, or confidential workflows.

🚀 Quick Start

Run the entire stack in 3 steps:

# 1. Start Storage (MinIO)
docker-compose up -d

# 2. Start Backend (Python/FastAPI)
cd ai-engine && pip install -r requirements.txt && uvicorn main:app --reload

# 3. Start Frontend (React)
cd web-client && npm install && npm run dev

Detailed Setup: See GETTING_STARTED.md for full environment config.

📸 Demo & Usage

1. Upload Interface

Secure Drag-and-Drop using Direct-to-S3 Pre-signed URLs

2. Transcription Results

Accurate, timestamped text output running locally.

Deep Dive: See ARCHITECTURE.md for the System Design.

✨ Key Features

🔒 100% Offline: Runs entirely on localhost. No data sent to Cloud.
⚡ High Performance: Uses CTranslate2 (Faster-Whisper) for 4x faster CPU inference.
📂 Direct-to-Storage: Bypasses backend limits using Pre-Signed URLs for massive file support.
🛡️ Enterprise Ready: S3-compatible storage layer (MinIO) for scalability.

🏗️ Architecture

The "Sidecar" Upload Pattern

Frontend requests permission (Token).
Backend grants Pre-Signed URL.
Frontend uploads Heavy Audio directly to Storage.
Backend accesses Storage internally to process AI.

📚 Documentation

Document	Description
System Architecture	Diagrams, Privacy Design, and Tech Choices.
Getting Started	Full installation and troubleshooting guide.
Failure Scenarios	How we handle crashes and offline modes.
Interview Q&A	"Why Pre-Signed URLs?" and other senior questions.

🔧 Tech Stack

Domain	Technology	Use Case
Frontend	React (Vite)	Fast, modern SPA for file management.
Backend	Python (FastAPI)	Async orchestration and security.
AI Engine	Faster-Whisper	Optimized local inference.
Storage	MinIO	S3-Compatible Object Store.

👤 Author

Harshan Aiyappa
Senior Full-Stack Hybrid Engineer
GitHub Profile

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
ai-engine		ai-engine
docs		docs
web-client		web-client
.gitattributes		.gitattributes
.gitignore		.gitignore
PROJECT_MANUAL.md		PROJECT_MANUAL.md
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceSync AI

Secure Offline Audio Transcription Platform

🚀 Quick Start