TTS Playground is a full-stack text-to-speech app built using a microservices architecture. It lets users enter text, pick from over 25 different voices, and get back an audio file β all handled asynchronously in the background so the app stays fast and responsive.
Behind the scenes, the system is split into multiple services, each with a clear role. A Fastify-based API handles incoming requests, while a Python (FastAPI) service takes care of generating the audio. Data is stored in PostgreSQL, Redis is used for caching and rate limiting, and RabbitMQ manages background jobs.
The goal of this project was to experiment with real-world backend patterns like service separation, async processing, and scalable architecture β while also building something practical and usable.
πΈ Screenshots / screen recording coming soon
root/
β
βββ apps/
β βββ web/ # Frontend app (Next.js - UI, client-side logic)
β
βββ packages/
β βββ db/ # Database layer (Drizzle ORM schemas, migrations)
β
βββ services/
β βββ auth/ # Authentication service (JWT, sessions, user auth)
β β βββ src/
β β β βββ modules/ # Feature-based structure (auth logic, user module, etc.)
β β β βββ plugins/ # Fastify plugins (JWT, cookies, hooks)
β β β βββ utils/ # Service-specific helpers
β β β βββ server.ts # Entry point (Fastify app setup)
β β βββ package.json # Service dependencies & scripts
β β βββ tsconfig.json # TypeScript config (extends root config)
β β
β βββ gateway/ # API Gateway (entry point, routing, aggregation)
β β βββ ...
β βββ tts/ # Text-to-Speech engine (Python, audio generation)
β β βββ ...
β βββ voice/ # Voice management (voices, configs, metadata)
β β βββ ...
β βββ worker/ # Background worker (async jobs, queues, TTS processing)
β βββ ...
β
βββ .env # Environment variables (secrets, config)
βββ pnpm-workspace.yaml # Defines monorepo structure (apps, packages, services)
βββ pnpm-lock.yaml # Lockfile (ensures consistent dependency versions)
βββ tsconfig.json # Base TypeScript config (shared across all projects)
βββ docker-compose.yaml # Runs all services together (dev / local orchestration)
βββ package.json # Root config (scripts, workspace settings)| Tool | Description | Access |
|---|---|---|
| PostgreSQL | Primary relational database | β |
| MinIO | S3-compatible object storage (audio files) | http://localhost:9000 |
| Drizzle ORM | Type-safe ORM & migrations | https://local.drizzle.studio (pnpm db:studio) |
| Tool | Description | Access |
|---|---|---|
| Redis | In-memory caching (sessions, rate limiting) | β |
| RabbitMQ | Async job queues & message broker | http://localhost:15672 |
| Tool | Description | Access |
|---|---|---|
| Traefik | API Gateway, reverse proxy, load balancer | http://localhost:8080 |
| Docker | Containerization & orchestration | β |
| Tool | Description | Access |
|---|---|---|
| Fastify | Backend framework (auth, gateway, voice, worker) | β |
| FastAPI | Python framework (TTS service) | β |
| Tool | Description | Access |
|---|---|---|
| Next.js | Web app (UI & client-side logic) | http://localhost:3000 |
For more details: API reference doc
Runs on http://localhost:3001. Handles user registration, login, session management, and API key issuance.
Supports two authentication methods:
- JWT Auth β short-lived tokens (50-minute expiry) issued on login. The full key is returned once and only its hash is persisted in the database. Suitable for interactive, session-based usage.
- API Key Auth β long-lived keys for programmatic or machine-to-machine access. Keys are generated on demand and stored as hashes.
Endpoints overview:
| Method | Route | Description |
|---|---|---|
POST |
/auth/register |
Register a new user account |
POST |
/auth/login |
Authenticate and receive a JWT token |
POST |
/auth/logout |
Invalidate the current session |
POST |
/auth/refresh |
Refresh an expired JWT token |
GET |
/auth/me |
Retrieve the currently authenticated user's profile |
POST |
/auth/api-keys |
Generate a new long-lived API key |
GET |
/auth/api-keys |
List all API keys for the current user |
DELETE |
/auth/api-keys/:id |
Revoke an existing API key |
Runs on http://localhost:3002. Manages the catalog of available TTS voices, their configurations, and associated metadata. Provides endpoints to browse, filter, and retrieve voice details used when submitting TTS jobs.
Endpoints overview:
| Method | Route | Description |
|---|---|---|
GET |
/voices |
List all available voices (with optional filters: language, gender, engine) |
GET |
/voices/:id |
Retrieve details and configuration for a specific voice |
POST |
/voices |
Register a new custom voice (admin) |
PUT |
/voices/:id |
Update voice metadata or configuration (admin) |
DELETE |
/voices/:id |
Remove a voice from the catalog (admin) |
Runs on http://localhost:3003. Manages the lifecycle of TTS generation jobs β from submission to completion. Because audio generation can be time-intensive, all processing is handled asynchronously via a RabbitMQ queue.
Job lifecycle:
- A user submits text via the API β a job record is created in PostgreSQL with status
pendingand pushed onto the RabbitMQ queue. - The worker consumer picks up the job and forwards it to the TTS engine service for audio generation.
- The generated audio file is uploaded to MinIO (S3-compatible storage).
- The job record is updated in the database with status
completedand a reference to the stored audio file. - The client can poll the job status endpoint or subscribe to notifications to retrieve the result.
Endpoints overview:
| Method | Route | Description |
|---|---|---|
POST |
/jobs |
Submit a new TTS job (text, voice ID, output format) |
GET |
/jobs |
List all jobs for the authenticated user |
GET |
/jobs/:id |
Get the status and result of a specific job |
DELETE |
/jobs/:id |
Cancel a pending job or delete a completed one |
GET |
/jobs/:id/audio |
Download or stream the generated audio file |
Runs on http://localhost:8000 (Python / FastAPI). The core audio generation engine, responsible for converting text to speech using the configured TTS model. This service is consumed internally by the worker and is not exposed directly to end users.
- Type: Text-to-speech
- Model: Kokoro-82M β a lightweight open-source TTS model with only 82M parameters
- Hardware: CPU-compatible (no GPU required)
- Voices: 25 built-in voices across American and British English
- Backend: PyTorch
Endpoints overview:
| Method | Route | Description |
|---|---|---|
POST |
/tts/generate |
Generate audio from text using a specified voice and engine |
GET |
/tts/voices |
List voices supported by the active TTS engine |
GET |
/tts/health |
Health check and model readiness status |
- Will be used for voice cloning β generate speech that mimics a target voice from an audio sample.
This project uses pnpm as the package manager. Make sure Node.js and npm are installed, then install pnpm globally:
npm install -g pnpmFor more info about scripts, see package.json.
Build the TTS service image first:
pnpm tts:build # docker compose up --build -d ttsStart all background services (PostgreSQL, Redis, RabbitMQ, MinIO, Traefik, TTS):
pnpm infra # Start all background services
pnpm infra:down # Stop all services
pnpm infra:ps # Check service status- Auth service (
port 3001)
pnpm dev:auth- Voice service (
port 3002)
pnpm dev:voice- Worker service (
port 3003)
pnpm dev:worker- TTS engine service (
port 8000)
After running pnpm infra, build and start the TTS Docker container:
docker compose build tts
docker compose up tts- Frontend (
port 3000)
pnpm dev:webThis project is licensed under the MIT License


