Skip to content

Kimosabey/voicesync-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VoiceSync AI

Thumbnail

Secure Offline Audio Transcription Platform

Status License AI

VoiceSync AI is a privacy-first transcription tool that runs 100% Offline. It uses OpenAI's Whisper model (via faster-whisper) locally on your device to convert speech to text, ensuring no audio data ever leaves your secure environment. Ideal for legal, medical, or confidential workflows.


🚀 Quick Start

Run the entire stack in 3 steps:

# 1. Start Storage (MinIO)
docker-compose up -d

# 2. Start Backend (Python/FastAPI)
cd ai-engine && pip install -r requirements.txt && uvicorn main:app --reload

# 3. Start Frontend (React)
cd web-client && npm install && npm run dev

Detailed Setup: See GETTING_STARTED.md for full environment config.


📸 Demo & Usage

1. Upload Interface

Upload Secure Drag-and-Drop using Direct-to-S3 Pre-signed URLs

2. Transcription Results

Result Accurate, timestamped text output running locally.

Deep Dive: See ARCHITECTURE.md for the System Design.


✨ Key Features

  • 🔒 100% Offline: Runs entirely on localhost. No data sent to Cloud.
  • ⚡ High Performance: Uses CTranslate2 (Faster-Whisper) for 4x faster CPU inference.
  • 📂 Direct-to-Storage: Bypasses backend limits using Pre-Signed URLs for massive file support.
  • 🛡️ Enterprise Ready: S3-compatible storage layer (MinIO) for scalability.

🏗️ Architecture

Architecture

The "Sidecar" Upload Pattern

  1. Frontend requests permission (Token).
  2. Backend grants Pre-Signed URL.
  3. Frontend uploads Heavy Audio directly to Storage.
  4. Backend accesses Storage internally to process AI.

📚 Documentation

Document Description
System Architecture Diagrams, Privacy Design, and Tech Choices.
Getting Started Full installation and troubleshooting guide.
Failure Scenarios How we handle crashes and offline modes.
Interview Q&A "Why Pre-Signed URLs?" and other senior questions.

🔧 Tech Stack

Domain Technology Use Case
Frontend React (Vite) Fast, modern SPA for file management.
Backend Python (FastAPI) Async orchestration and security.
AI Engine Faster-Whisper Optimized local inference.
Storage MinIO S3-Compatible Object Store.

👤 Author

Harshan Aiyappa
Senior Full-Stack Hybrid Engineer
GitHub Profile


📝 License

This project is licensed under the MIT License - see the LICENSE file for details.