An AI-powered voice transcription application with a React frontend and FastAPI backend. Records audio in the browser, transcribes with Whisper, optionally cleans text with an LLM, and lets you chat about the latest transcript.
- Browser recording and file upload
- Local Whisper speech-to-text
- Optional LLM cleaning (OpenAI API-compatible providers)
- RAG-powered chat with semantic search (sqlite-vec + Ollama embeddings)
- Streaming chat with transcript context
- Export to Markdown, TXT, or PDF
- Persistent transcript history (SQLite)
- Keyboard shortcuts (press
?to see all) - Dark/light theme
The easiest way to run the app is with Docker Compose:
# Clone the repository
git clone <repo-url>
cd local-ai-transcript-app
# Copy environment template and configure
cp backend/.env.example backend/.env
# Edit backend/.env with your LLM configuration
# Start services (frontend + backend)
docker compose up -d
# Wait for services to start (Whisper model downloads on first run)
# This can take a few minutes depending on your internet speed
# Open the app
open http://localhost:3000| Service | Port | Description |
|---|---|---|
| Frontend | 3000 | React app served via Nginx |
| Backend | 8000 | FastAPI with Whisper |
Note: Ollama is not included by default. To use local LLM, either run Ollama separately (
ollama serve) or uncomment the Ollama service indocker-compose.yml.
Create a .env file in the backend/ directory to customize:
# LLM Model (default: llama2)
LLM_MODEL=llama2
# Whisper Model (default: base.en)
# Options: tiny, tiny.en, base, base.en, small, small.en, medium, large-v3
WHISPER_MODEL=base.en
# Embeddings for RAG (optional, enables semantic search)
EMBEDDING_BASE_URL=http://localhost:11434
EMBEDDING_MODEL=nomic-embed-text
# Optional: Fallback to OpenAI if Ollama fails
LLM_FALLBACK_BASE_URL=https://api.openai.com/v1
LLM_FALLBACK_API_KEY=sk-your-key
LLM_FALLBACK_MODEL=gpt-3.5-turbo# Start services
docker compose up -d
# View logs
docker compose logs -f
# Stop services
docker compose down
# Rebuild after code changes
docker compose up -d --build
# Remove all data (reset)
docker compose down -v- Python 3.12+
- Node.js 20+
- uv (Python package manager)
- An LLM server (Ollama, LM Studio, or OpenAI API key)
cd backend
# Copy environment template
cp .env.example .env
# Edit .env with your LLM configuration
# For Ollama: LLM_BASE_URL=http://localhost:11434/v1
# Install dependencies
uv sync
# Start the server
uv run uvicorn app:app --reload --port 8000cd frontend
# Install dependencies
npm install
# Start dev server
npm run dev
# Open http://localhost:5173┌─────────────────────────────────────────────────────────────┐
│ Frontend │
│ React 19 + Vite + TypeScript + Tailwind │
│ └── Nginx (production) or Vite dev server │
└───────────────────────┬─────────────────────────────────────┘
│ HTTP
┌───────────────────────┴─────────────────────────────────────┐
│ Backend │
│ FastAPI + SQLAlchemy + SQLite │
│ ├── Whisper (speech-to-text) │
│ └── OpenAI-compatible LLM client │
└───────────────────────┬─────────────────────────────────────┘
│
┌───────────────────────┴─────────────────────────────────────┐
│ Ollama │
│ Local LLM server (llama2, mistral, etc.) │
└─────────────────────────────────────────────────────────────┘
| Method | Path | Description |
|---|---|---|
| GET | /api/status |
Service health check |
| GET | /api/system-prompt |
Get default LLM cleaning prompt |
| GET | /api/transcripts |
List all transcripts |
| POST | /api/transcripts |
Create transcript |
| GET | /api/transcripts/:id |
Get transcript |
| PUT | /api/transcripts/:id |
Update transcript |
| DELETE | /api/transcripts/:id |
Delete transcript |
| GET | /api/transcripts/:id/messages |
Get chat messages for transcript |
| POST | /api/transcripts/:id/messages |
Add chat message to transcript |
| GET | /api/transcripts/:id/export?format=md|txt|pdf |
Export transcript |
| POST | /api/transcribe |
Transcribe audio file |
| POST | /api/clean |
Clean text with LLM |
| POST | /api/generate-title |
Generate AI title |
| POST | /api/chat |
Chat (non-streaming) |
| POST | /api/chat/stream |
Chat (SSE streaming) |
| GET | /api/transcripts/:id/chunks |
Get transcript chunks |
| POST | /api/transcripts/:id/reindex |
Reindex transcript for RAG |
| GET | /api/embeddings/status |
Check embedding service status |
The following endpoints have rate limiting to prevent abuse:
| Endpoint | Limit | Reason |
|---|---|---|
/api/transcribe |
5/minute | CPU-intensive Whisper processing |
/api/clean |
20/minute | LLM API call |
/api/generate-title |
30/minute | LLM API call (fast) |
/api/chat |
20/minute | LLM API call |
/api/chat/stream |
20/minute | LLM streaming |
/api/transcripts/{id}/export |
30/minute | PDF generation |
| Key | Action |
|---|---|
V |
Hold to record, release to stop |
Ctrl/⌘ + N |
New transcript |
Ctrl/⌘ + Enter |
Submit text input |
? |
Show all shortcuts |
Escape |
Close dialogs |
The app works with any OpenAI API-compatible provider:
- Ollama (default, local):
http://localhost:11434/v1 - OpenAI:
https://api.openai.com/v1 - LM Studio:
http://localhost:1234/v1 - Groq:
https://api.groq.com/openai/v1 - Together AI:
https://api.together.xyz/v1
Configure via environment variables in .env or docker-compose.
- Allow microphone access in browser settings
- Use HTTPS in production (required for
getUserMedia)
- Use a smaller Whisper model (
tiny.enorbase.en) - Ensure GPU acceleration is available
- Check that Ollama is running:
curl http://localhost:11434/api/tags - Pull a model:
ollama pull llama2 - Check logs:
docker compose logs ollama
- Pull the embedding model:
ollama pull nomic-embed-text - Check embedding status:
curl http://localhost:8000/api/embeddings/status - RAG gracefully falls back to full transcript context if unavailable
- Ensure Docker has enough memory (at least 4GB)
- Try rebuilding:
docker compose build --no-cache