SARVIK - Complete Project Documentation

Overview

SARVIK (Smart Assistant for Real-time Voice Interaction and Knowledge) is an advanced AI personal assistant developed by Ganpat University students: Karan, Krish, and Vaibhavi. The project implements a sophisticated microservices architecture combining voice processing, natural language understanding, text-to-speech synthesis, persistent context management, and real-world function calling capabilities.

🆕 Key Capabilities

Voice & Text Interaction: Natural conversation with real-time audio visualization
Dual LLM Providers: Switch between local Qwen3-4B (GPU) or cloud Groq API (Llama 3.3 70B)
Function Calling: 9+ integrated tools for real-world actions (weather, Gmail, Drive, Calendar)
Google Services Integration: Connect Gmail, Drive, and Calendar with OAuth
Context-Aware AI: Semantic search across conversation history for relevant responses
Voice Biometrics: Secure voice enrollment and verification

▶ Demo

Interactive web demo to explore UI & features (Frontend Only)

Complete system showcase with backend, function calling & AI responses

🏗️ Architecture Overview

SARVIK follows a microservices architecture with 4 main components:

┌─────────────────────────────────────────────────────────────────────┐
│                         SARVIK ECOSYSTEM                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌──────────────────┐                                               │
│  │  MYAI-DESKTOP    │  ← Electron + React Frontend                  │
│  │  (Port 3000)     │     • Voice Input/Output                      │
│  └────────┬─────────┘     • Chat Interface                          │
│           │                • Audio Visualization                    │
│           ↓                                                         │
│  ┌──────────────────┐                                               │
│  │  MYAI-BACKEND    │  ← FastAPI Backend + AI Services              │
│  │  (Port 8000)     │     • Authentication                          │
│  └────────┬─────────┘     • Voice Processing (Whisper, SpeechBrain) │
│           │                • Context Management                     │
│           │                • Conversation Storage                   │
│           │                                                         │
│       ┌───┴────┬──────────────────┐                                 │
│       ↓        ↓                   ↓                                │
│  ┌─────────┐ ┌─────────┐    ┌──────────┐                            │
│  │ LLM     │ │  TTS    │    │ DATABASES│                            │
│  │PROVIDERS│ │ SERVER  │    │          │                            │
│  │(8001)   │ │(8002)   │    │PostgreSQL│                            │
│  │         │ │         │    │  Redis   │                            │
│  │Qwen3-4B │ │ Piper   │    │  Qdrant  │                            │
│  │Groq API │ │         │    │          │                            │
│  └─────────┘ └─────────┘    └──────────┘                            │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

📦 Components Summary

1. myai-desktop (Electron + React)

Purpose: User-facing desktop application
Port: 3000
Key Features:
- Voice interaction with real-time audio visualization
- Text-based chat interface
- Voice enrollment and authentication
- Conversation history management
- Device audio management
Technology Stack: Electron, React, Styled Components, Three.js
Documentation: MYAI_DESKTOP.md

2. myai-backend (FastAPI Backend)

Purpose: Core API server and AI orchestration
Port: 8000
Key Features:
- User authentication (Google OAuth + JWT)
- Voice processing (Whisper ASR + SpeechBrain)
- Context management (embedding-based semantic search)
- Conversation storage (PostgreSQL + Qdrant)
- LLM and TTS orchestration
- Function calling orchestrator (detects and executes tool calls)
- Google Services OAuth (Gmail, Drive, Calendar integration)
- 9+ Built-in Tools (weather, email, calendar, file management)
Technology Stack: FastAPI, SQLAlchemy, Whisper, SpeechBrain, Sentence Transformers, Google APIs
Documentation: MYAI_BACKEND.md

3. llm-server (LLM Microservice)

Purpose: Dedicated LLM inference server with dual provider support
Port: 8001
Key Features:
- Dual LLM Providers: Server-Hosted (Qwen3-4B GPU) or Cloud (Groq API)
- Runtime Switching: Change providers without restart
- Smart Routing: Automatic provider selection based on user preference
- Streaming token generation (20-350 tokens/sec)
- Concurrent request handling
- Separate system prompts for voice/text modes
Technology Stack: FastAPI, llama-cpp-python, CUDA, Groq SDK
Documentation: 3_LLM_SERVER.md (includes provider integration details)

4. tts-server (TTS Microservice)

Purpose: Text-to-speech synthesis server
Port: 8002
Key Features:
- Real-time audio synthesis using Piper TTS
- Multiple voice support (lessac, sarah, alba, amy, david)
- WebSocket-based audio streaming
- Parallel sentence synthesis with sequencing
Technology Stack: FastAPI, Piper TTS, ONNX Runtime
Documentation: TTS_SERVER.md

🔄 Complete System Workflow

Voice Mode Query Flow

1. USER SPEAKS
   ↓
2. DESKTOP: Audio Recording
   - globalAudioManager.js captures audio
   - Sends to backend via /api/voice/process
   ↓
3. BACKEND: Voice Processing
   - Whisper transcribes audio → text
   - SpeechBrain verifies user identity
   - Stores user query in PostgreSQL
   - Generates 768D embedding via Sentence Transformers
   ↓
4. BACKEND: Context Building
   - Retrieves recent conversations (PostgreSQL)
   - Semantic search in Qdrant (768D vectors)
   - Combines context for LLM
   ↓
5. LLM PROVIDER SELECTION:
   - Backend checks user.llm_provider setting
   - Routes to Server-Hosted (Qwen3-4B) or Groq API (cloud llm)
   
6. LLM GENERATION: Response Generation
   - Server-Hosted: POST /generate-stream (Qwen3-4B on GPU)
   - Groq API: client.chat.completions.create() (Cloud LLM)
   - Generates streaming response (20-350 tok/sec)
   - Returns tokens via SSE
   ↓
7. BACKEND: Parallel Processing
   - Buffers tokens into sentences
   - Sends sentences to TTS server
   - Streams tokens to desktop
   ↓
8. TTS SERVER: Audio Synthesis
   - Converts sentences to speech (Piper)
   - Broadcasts WAV audio via WebSocket
   ↓
9. DESKTOP: Playback
   - Receives text tokens (displays in chat)
   - Receives audio chunks (plays sequentially)
   - Updates conversation UI

Text Mode Query Flow

1. USER TYPES
   ↓
2. DESKTOP: Text Submission
   - Sends query to /api/query/process-llm-stream
   ↓
3. BACKEND: Context + LLM Provider Selection
   - Stores query in PostgreSQL
   - Builds context (recent + semantic)
   - Formats prompt with TEXT MODE system prompt
   - Checks user.llm_provider setting
   ↓
4. LLM GENERATION: Text Generation
   - Server-Hosted: Qwen3-4B on GPU (slow)
   - Groq API: Llama 3.3 70B cloud (very fast)
   - Generates detailed response
   - Streams tokens back
   ↓
5. DESKTOP: Rendering
   - Markdown rendering with syntax highlighting
   - Code blocks, lists, formatting
   - Real-time token display

🗄️ Database Architecture

PostgreSQL (Relational Data)

Collections/Tables:

users - User accounts (Google OAuth)
- llm_provider - Current provider ('server_hosted' or 'groq_api')
- groq_api_key_encrypted - Fernet-encrypted API key
- groq_api_key_expires_at - 30-day auto-expiry
- groq_model - Selected Groq model
voice_profiles - Encrypted voice embeddings
conversations - All user conversations (user + assistant messages)
settings - User preferences (timezone, voice preference, etc.)
service_connections - OAuth connections for external services
- service_name - Service type ('gmail', 'drive', 'calendar')
- access_token - Encrypted OAuth access token
- refresh_token - Encrypted OAuth refresh token
- token_expires_at - Token expiry timestamp
- scopes - Granted OAuth permissions
- service_account_email - Connected Google account for connection

Redis (Session & Cache)

Usage:

Voice enrollment sessions (temporary)
Session tokens
Rate limiting
Caching

Qdrant (Vector Database)

Collections (Per-User):

conversations_{user_id} - 768D embeddings of conversations for semantic search

Why Separate Collections?

Data isolation per user
Privacy and security
Optimized search performance

🔐 Security & Authentication

Authentication Flow

1. Desktop → Google OAuth Login
2. Backend validates with Google
3. Issues JWT token (10080 min expiry)
4. Token stored in localStorage
5. All API calls include: Authorization: Bearer {token}
6. Backend verifies JWT on every request

Voice Authentication

1. Enrollment (3 phrases)
   - Extract voice embeddings (SpeechBrain)
   - Create centroid + variance
   - Encrypt and store in PostgreSQL
   
2. Verification
   - User says "Hey SARVIK"
   - Extract embedding
   - Compare with stored profile (cosine similarity)
   - Threshold: 0.70 (configurable)

🌐 API Communication Map

Desktop → Backend

Authentication & User:

POST /api/auth/google - Google authentication
DELETE /api/auth/account - Delete user account

Voice Processing:

POST /api/voice/process - Voice transcription
POST /api/voice/enrollment/start - Start voice enrollment
POST /api/voice/enrollment/record-phrase - Record enrollment phrase
POST /api/voice/enrollment/complete - Complete enrollment
POST /api/voice/verify - Verify voice
PUT /api/voice-settings/preference - Update voice preference

Query Processing (with Function Calling):

POST /api/query/process-llm-stream - Text mode query (supports function calling)
POST /api/query/voice-stream-with-audio - Voice mode query (supports function calling)

Conversations:

GET /api/conversations/history - Get conversation history
DELETE /api/conversations/{id} - Delete conversation

LLM Provider Settings:

GET /api/llm-settings - Get current LLM provider settings
PUT /api/llm-settings/provider - Switch LLM provider (server_hosted/groq_api)
POST /api/llm-settings/groq-key - Save encrypted Groq API key
PUT /api/llm-settings/groq-model - Update Groq model selection
POST /api/llm-settings/test-connection - Test Groq API connection
DELETE /api/llm-settings/groq-key - Delete Groq API key

Service OAuth (Gmail, Drive, Calendar):

POST /oauth/connect/{service} - Initiate OAuth flow for service
GET /oauth/callback - OAuth callback handler
POST /oauth/disconnect/{service} - Disconnect service
GET /oauth/status - Get all service connection statuses

Backend → LLM Providers

Server-Hosted (localhost:8001):

GET /health - Check model status
POST /generate - Non-streaming generation (not currently used)
POST /generate-stream - Streaming generation (SSE) ← USED

Groq API (groq.com):

Via Groq SDK: client.chat.completions.create(stream=True) ← USED
Automatic format conversion: Qwen3 → OpenAI messages

Backend → TTS Server

GET /health - Check TTS status
GET /voices - List available voices
POST /synthesize-sentence - Synthesize sentence with voice and sequence
POST /flush - Flush remaining buffer

Desktop → TTS Server (WebSocket)

ws://localhost:8002/ws/audio-stream - Audio chunk streaming

🔧 Function Calling System (v2.2.0)

Overview

SARVIK now has real-world action capabilities through an integrated function calling system. The LLM can automatically detect when it needs external data or actions, execute tools, and use the results to provide informed responses.

Available Tools (9+)

Weather Tools:

get_weather - Get current weather for any location (with auto IP-based geolocation)

Gmail Tools:

gmail_read_emails - Read recent emails
gmail_search_emails - Search emails by query
gmail_send_email - Send emails

Google Drive Tools:

drive_list_files - List files in Drive
drive_search_files - Search Drive by name
drive_create_folder - Create folders

Google Calendar Tools:

calendar_list_events - List upcoming events
calendar_search_events - Search calendar events
calendar_create_event - Create new events

Function Calling Workflow

1. USER QUERY: "What's the weather in Gandhinagar?"
   ↓
2. BACKEND: Sends query to LLM with tool schemas
   ↓
3. LLM: Detects need for weather data
   → Outputs: <TOOL_CALL>call-123|get_weather|{"city": "cityname"}</TOOL_CALL>
   ↓
4. FUNCTION CALLING ORCHESTRATOR:
   - Detects tool call marker in stream
   - Parses: tool_name="get_weather", args={"city": "cityname"}
   - Executes tool via registry
   ↓
5. WEATHER TOOL:
   - Calls OpenWeatherMap API
   - Returns: {"temperature": 28, "condition": "Clear", ...}
   ↓
6. ORCHESTRATOR:
   - Injects result into context as "[TOOL RESULT]"
   - Sends updated prompt back to LLM
   ↓
7. LLM: Generates natural response
   → "The weather in Gandhinagar is clear with a temperature of 28°C."
   ↓
8. USER receives informed response

Key Features

Automatic Detection: LLM decides when tools are needed Parallel Execution: Multiple tool calls in complex queries Error Handling: Graceful fallback if tools fail Context Injection: Tool results seamlessly integrated Security: OAuth-based authentication for Google services IP Geolocation: Auto-detect user location for weather

Google Services Integration

Separate OAuth Flow:

Users connect Gmail/Drive/Calendar separately from SARVIK login
Can use different Google account than SARVIK login
Encrypted token storage with Fernet
Automatic token refresh before expiry

Connection Workflow:

1. User opens Settings → Service Connections
2. Clicks "Connect Gmail"
3. Backend generates OAuth URL
4. Opens Google authorization in browser
5. User grants permissions
6. OAuth callback stores encrypted tokens
7. Tools can now access Gmail data

Supported Scopes:

Gmail: Read, search, and send emails
Drive: List, search, and create files/folders
Calendar: Read, search, and create events

📊 Data Flow: Voice Query Example

Desktop (User speaks "What's the weather?")
   │
   ├─ Records audio (globalAudioManager)
   │
   └─ POST /api/voice/process (FormData: audio.webm)
        │
        ↓
Backend (myai-backend)
   │
   ├─ Whisper transcribes → "What's the weather?"
   ├─ Stores in PostgreSQL (conversations table)
   ├─ Generates 768D embedding (Sentence Transformers)
   ├─ Searches Qdrant for semantic matches
   ├─ Retrieves recent conversations (PostgreSQL)
   │
   └─ POST to LLM Server /generate-stream
        │  Payload: {
        │    "prompt": "<|im_start|>system\n{VOICE_SYSTEM_PROMPT}<|im_end|>...",
        │    "max_tokens": 512,
        │    "temperature": 0.7
        │  }
        ↓
LLM Server (llm-server)
   │
   ├─ Qwen3-4B processes prompt
   ├─ Generates tokens: ["I", " don't", " have", " real-time", ...]
   │
   └─ Streams via SSE: data: {"token": "I"}\n\n
        │
        ↓
Backend (Receives tokens, parallel processing)
   │
   ├─ Streams tokens to Desktop (SSE)
   │    └─ Desktop displays: "I don't have real-time..."
   │
   └─ Buffers into sentences
        │  "I don't have real-time internet access."
        │
        └─ POST to TTS Server /synthesize-sentence
             │  Payload: {
             │    "text": "I don't have real-time internet access.",
             │    "voice": "lessac",
             │    "sequence": 1
             │  }
             ↓
TTS Server (tts-server)
   │
   ├─ Piper TTS synthesizes sentence
   ├─ Generates WAV audio bytes
   │
   └─ Broadcasts via WebSocket to Desktop
        │  Message: {"audio": "<base64>", "sequence": 1}
        ↓
Desktop (Receives audio)
   │
   ├─ Decodes base64 → WAV
   ├─ Queues for sequential playback
   └─ Plays audio through speakers

🎯 Key Features & Technologies

Voice Processing

ASR: OpenAI Whisper (base model)
Voice Biometrics: SpeechBrain (ECAPA-TDNN)
Audio Quality: WebRTC VAD, noise reduction
Encryption: Fernet encryption for voice embeddings

Context Management

Embeddings:
- 768D (all-mpnet-base-v2) for conversations
- 384D (all-MiniLM-L6-v2) for reminders
- 512D (SpeechBrain) for voice profiles
Storage:
- PostgreSQL for structured data
- Qdrant for vector similarity search
Context Building:
- Recent conversations (last 10 messages)
- Semantic search (top 5 matches)
- Token counting (max 4000 tokens)

LLM Integration

Dual Providers:
- Server-Hosted: Qwen3-4B (quantized GGUF) on local GPU
- Groq API: Llama 3.3 70B, GPT OSS 120B/20B, Qwen3 32B (cloud)
Runtime Switching: Change providers without restart via user settings
Smart Routing: Backend factory pattern selects provider based on user.llm_provider
Format Conversion: Automatic Qwen3 → OpenAI format for Groq compatibility
Performance:
- Server-Hosted: 35 tok/sec (GPU-dependent)
- Groq API: 250 tok/sec (7x faster)
Inference: llama-cpp-python (local) + Groq SDK (cloud)
Modes:
- Voice Mode: Brief, conversational responses
- Text Mode: Detailed, formatted responses
Security: Fernet (AES-128) encryption for Groq API keys
Concurrency: Semaphore-based request limiting

TTS System

Engine: Piper TTS (ONNX-based)
Voices: 4 high-quality voices (lessac, ryan, kimberly, amy)
Optimization: Parallel sentence synthesis with sequencing
Streaming: WebSocket-based real-time audio delivery

📁 Project Structure

sarvik-working/
├── myai-backend/          # Main FastAPI backend
│   ├── app/
│   │   ├── api/           # API endpoints
│   │   │   ├── auth.py
│   │   │   ├── voice.py
│   │   │   ├── query.py
│   │   │   └── query_voice_audio.py
│   │   ├── core/          # Core configuration
│   │   │   ├── config.py
│   │   │   ├── database.py
│   │   │   └── security.py
│   │   ├── models/        # SQLAlchemy models
│   │   ├── services/      # Business logic
│   │   │   ├── llm_service.py
│   │   │   ├── llm_providers/  # LLM provider architecture
│   │   │   │   ├── base_provider.py
│   │   │   │   ├── server_provider.py
│   │   │   │   └── groq_provider.py
│   │   │   ├── tts_client.py
│   │   │   ├── context_manager.py
│   │   │   ├── embedding_service.py
│   │   │   └── voice_service.py
│   │   └── main.py        # FastAPI app
│   └── requirements.txt
│
├── myai-desktop/          # Electron + React app
│   ├── public/
│   │   ├── electron.js    # Electron main process
│   │   └── preload.js     # Preload script
│   ├── src/
│   │   ├── components/    # React components
│   │   ├── services/      # API & audio services
│   │   │   ├── apiService.js
│   │   │   ├── globalAudioManager.js
│   │   │   └── deviceManager.js
│   │   ├── context/       # React context
│   │   └── App.jsx        # Main app
│   └── package.json
│
├── llm-server/            # LLM microservice
│   ├── app/
│   │   ├── main.py        # FastAPI server
│   │   ├── llm_service.py # Qwen model wrapper
│   │   └── config.py      # LLM configuration
│   └── requirements.txt
│
├── tts-server/            # TTS microservice
│   ├── app/
│   │   ├── main.py        # FastAPI + WebSocket
│   │   ├── tts_service.py # Piper TTS wrapper
│   │   └── config.py      # TTS configuration
│   └── requirements.txt
│
└── documentation/         # Project documentation
    └── docs/              # NEW comprehensive docs
        ├── README.md      # This file
        ├── MYAI_BACKEND.md
        ├── MYAI_DESKTOP.md
        ├── LLM_SERVER.md
        └── TTS_SERVER.md

🚀 Startup Sequence

Step 1: Database Services (Manual/Docker)

PostgreSQL on default port 5432
Redis on default port 6379
Qdrant on default port 6333

Step 2: Backend Services (Order matters!)

Terminal 1 - LLM Server: Navigate to llm-server folder, run python run.py (Port 8001)
Terminal 2 - TTS Server: Navigate to tts-server folder, run docker-compose up -d (Port 8002)
Terminal 3 - Main Backend: Navigate to myai-backend folder, run python run.py (Port 8000)

Step 3: Desktop Application

Navigate to myai-desktop folder
Development mode: Run npm start (port 3000)
Full Electron app: Run npm run electron-dev

🔧 Environment Configuration

myai-backend/.env

DATABASE_URL=postgresql://user:pass@localhost:5432/sarvik
REDIS_URL=redis://localhost:6379/0
QDRANT_URL=http://localhost:6333
SECRET_KEY=your-secret-key
GOOGLE_CLIENT_ID=your-google-client-id
GOOGLE_CLIENT_SECRET=your-google-client-secret
LLM_SERVER_URL=http://localhost:8001
ENCRYPTION_KEY=dDkzTmY2YjJRZTF2VU1rZ0hSSnpYMFlhaUN0TGQ3cG8=  # For Groq API key encryption

llm-server/.env

MODEL_PATH=models/qwen3-4b-instruct-q4_0.gguf
GPU_LAYERS=35
MAX_CONCURRENT_REQUESTS=3
CONTEXT_SIZE=8192
BATCH_SIZE=512

tts-server/.env

TTS_MODEL_PATH=models/en_US-lessac-medium.onnx
TTS_PORT=8002
LOG_LEVEL=INFO
MAX_SENTENCE_LENGTH=500

📈 Performance Optimizations

Backend Optimizations

Parallel Operations: Context building + query storage run concurrently
Embedding Reuse: Single embedding generation for storage + search
Async Model Loading: Background model loading with 10-min idle timeout
Connection Pooling: SQLAlchemy connection pool for PostgreSQL

LLM Optimizations

GPU Acceleration: CUDA-enabled llama-cpp-python
Model Quantization: Q4_0 quantization (4-bit) for faster inference
Concurrent Handling: Semaphore limiting to 3 concurrent requests
Streaming: Token-by-token delivery for perceived speed

TTS Optimizations

Parallel Synthesis: All sentences synthesized in parallel
Sequence Numbers: Maintain playback order despite parallel processing
WebSocket Streaming: Real-time audio delivery
Sentence Buffering: Smart segmentation on punctuation

🔄 LLM Provider Integration (v2.1.0)

Overview

SARVIK now supports dual LLM providers with seamless runtime switching between local GPU inference and cloud-based API calls.

Provider Options

Provider	Model	Speed	Context	Privacy
Server-Hosted	Qwen3-4B (local GPU)	35 tok/sec	256K tokens (depends on hosted resources)	100% Local
Groq API	Llama 3.3 70B (cloud)	250 tok/sec	128K tokens	Cloud-based

How Provider Switching Works

User Journey:

User opens Settings → LLM Provider
Default: Server-Hosted (uses local GPU)
To switch to Groq:
- Enter Groq API key (from console.groq.com)
- Select model (Llama 3.3 70B, GPT OSS 120B, etc.)
- Click "Save" → Backend encrypts key
- Click "Groq API" card → Provider switched
Next query automatically uses selected provider (no restart!)

Backend Implementation:

Query arrives → Backend loads user from database
    ↓
Check user.llm_provider column:
    ├─ "server_hosted" → ServerLLMProvider
    │       ↓
    │  POST localhost:8001/generate-stream
    │       ↓
    │  Local Qwen3-4B (GPU)
    │
    └─ "groq_api" → GroqLLMProvider
            ↓
        Decrypt API key from database
            ↓
        Convert Qwen3 format → OpenAI format
            ↓
        Groq SDK: client.chat.completions.create()
            ↓
        Cloud Llama 3.3 70B

Available Groq Models

llama-3.3-70b-versatile (Recommended - 70B params)
openai/gpt-oss-120b (Largest - 120B params)
openai/gpt-oss-20b (Fast - 20B params)
moonshotai/kimi-k2-instruct-0905 (Multilingual)
qwen/qwen3-32b (Balance - 32B params)

Security

API Key Encryption: Fernet (AES-128-CBC)
Auto-Expiry: Keys expire after 30 days
Secure Storage: Encrypted in PostgreSQL
No Plaintext: Keys only decrypted at request time

Database Schema Updates

-- users table (new columns)
llm_provider              VARCHAR(50)   -- 'server_hosted' or 'groq_api'
groq_api_key_encrypted    TEXT          -- Fernet-encrypted API key
groq_api_key_expires_at   TIMESTAMP     -- 30-day auto-expiry
groq_model                VARCHAR(100)  -- Selected Groq model

Format Conversion

Qwen3 Format (backend sends):

<|im_start|>system
You are SARVIK<|im_end|>
<|im_start|>user
Hello<|im_end|>

OpenAI Format (Groq expects):

[
  {"role": "system", "content": "You are SARVIK"},
  {"role": "user", "content": "Hello"}
]

Conversion happens automatically in GroqLLMProvider._parse_qwen_prompt_to_messages()

Performance Comparison

Metric	Server-Hosted	Groq API
Speed	35 tok/sec	250 tok/sec (7x faster)
First Token	~200ms	~180ms
Context	256K tokens	128K tokens
Max Output	16,385 tokens (max)	8,192 tokens
Cost	Free (GPU)	Free (14,400 req/day)
Privacy	100% Local	Cloud-based

For detailed implementation: See 3_LLM_SERVER.md - LLM Provider Integration

📚 Additional Documentation

1_MYAI_BACKEND.md - Backend Server Documentation
- Architecture layers and components
- All API endpoints with request/response formats
- Services detailed explanation
- Database architecture
- Security and authentication
- File locations: myai-backend/app/
2_MYAI_DESKTOP.md - Desktop Application Documentation
- Electron + React architecture
- Component structure and responsibilities
- Services (API, Audio Manager, Device Manager)
- State management (Auth and App contexts)
- Audio system pipeline
- File locations: myai-desktop/src/
3_LLM_SERVER.md - LLM Inference Server Documentation
- Qwen3-4B model configuration
- GPU offloading strategies
- API endpoints for generation
- Prompt engineering (Voice vs Text modes)
- Streaming implementation
- Performance optimization
- File locations: llm-server/app/
4_TTS_SERVER.md - TTS Synthesis Server Documentation
- Piper TTS voice models (4 voices)
- API endpoints for synthesis
- WebSocket protocol for audio streaming
- Parallel synthesis architecture
- Audio format specifications
- File locations: tts-server/app/

Last Updated: November, 2025

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Documentation		Documentation
SARVIK-Mobile		SARVIK-Mobile
llm-server		llm-server
myai-backend		myai-backend
myai-database		myai-database
myai-desktop		myai-desktop
tts-server		tts-server
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

SARVIK - Complete Project Documentation

Overview

🆕 Key Capabilities

▶ Demo

🏗️ Architecture Overview

📦 Components Summary

1. myai-desktop (Electron + React)

2. myai-backend (FastAPI Backend)

3. llm-server (LLM Microservice)

4. tts-server (TTS Microservice)

🔄 Complete System Workflow

Voice Mode Query Flow

Text Mode Query Flow

🗄️ Database Architecture

PostgreSQL (Relational Data)

Redis (Session & Cache)

Qdrant (Vector Database)

🔐 Security & Authentication

Authentication Flow

Voice Authentication

🌐 API Communication Map

Desktop → Backend

Backend → LLM Providers

Backend → TTS Server

Desktop → TTS Server (WebSocket)

🔧 Function Calling System (v2.2.0)

Overview

Available Tools (9+)

Function Calling Workflow

Key Features

Google Services Integration

📊 Data Flow: Voice Query Example

🎯 Key Features & Technologies

Voice Processing

Context Management

LLM Integration

TTS System

📁 Project Structure

🚀 Startup Sequence

🔧 Environment Configuration

myai-backend/.env

llm-server/.env

tts-server/.env

📈 Performance Optimizations

Backend Optimizations

LLM Optimizations

TTS Optimizations

🔄 LLM Provider Integration (v2.1.0)

Overview

Provider Options

How Provider Switching Works

Available Groq Models

Security

Database Schema Updates

Format Conversion

Performance Comparison

📚 Additional Documentation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages