🎙️ TTS Playground — Text-to-Speech Microservice App

TTS Playground is a full-stack text-to-speech app built using a microservices architecture. It lets users enter text, pick from over 25 different voices, and get back an audio file — all handled asynchronously in the background so the app stays fast and responsive.

Behind the scenes, the system is split into multiple services, each with a clear role. A Fastify-based API handles incoming requests, while a Python (FastAPI) service takes care of generating the audio. Data is stored in PostgreSQL, Redis is used for caching and rate limiting, and RabbitMQ manages background jobs.

The goal of this project was to experiment with real-world backend patterns like service separation, async processing, and scalable architecture — while also building something practical and usable.

🎬 Demo :

📸 Screenshots / screen recording coming soon

📂 Project Architecture :

root/
│
├── apps/
│   └── web/                     # Frontend app (Next.js - UI, client-side logic)
│
├── packages/
│   └── db/                      # Database layer (Drizzle ORM schemas, migrations)
│
├── services/
│   ├── auth/                    # Authentication service (JWT, sessions, user auth)
│   │   ├── src/
│   │   │   ├── modules/         # Feature-based structure (auth logic, user module, etc.)
│   │   │   ├── plugins/         # Fastify plugins (JWT, cookies, hooks)
│   │   │   ├── utils/           # Service-specific helpers
│   │   │   └── server.ts        # Entry point (Fastify app setup)
│   │   ├── package.json         # Service dependencies & scripts
│   │   └── tsconfig.json        # TypeScript config (extends root config)                
│   │
│   ├── gateway/                 # API Gateway (entry point, routing, aggregation)
│   │    └── ...
│   ├── tts/                     # Text-to-Speech engine (Python, audio generation)
│   │    └── ...
│   ├── voice/                   # Voice management (voices, configs, metadata)
│   │    └── ...
│   └── worker/                  # Background worker (async jobs, queues, TTS processing)
│        └── ...
│
├── .env                         # Environment variables (secrets, config)
├── pnpm-workspace.yaml          # Defines monorepo structure (apps, packages, services)
├── pnpm-lock.yaml               # Lockfile (ensures consistent dependency versions)
├── tsconfig.json                # Base TypeScript config (shared across all projects)
├── docker-compose.yaml          # Runs all services together (dev / local orchestration)
└── package.json                 # Root config (scripts, workspace settings)

🛠️ Technical Architecture :

Storage & ORM

Tool	Description	Access
PostgreSQL	Primary relational database	—
MinIO	S3-compatible object storage (audio files)	http://localhost:9000
Drizzle ORM	Type-safe ORM & migrations	https://local.drizzle.studio (pnpm db:studio)

Messaging & Caching

Tool	Description	Access
Redis	In-memory caching (sessions, rate limiting)	—
RabbitMQ	Async job queues & message broker	http://localhost:15672

Infrastructure

Tool	Description	Access
Traefik	API Gateway, reverse proxy, load balancer	http://localhost:8080
Docker	Containerization & orchestration	—

Backend

Tool	Description	Access
Fastify	Backend framework (auth, gateway, voice, worker)	—
FastAPI	Python framework (TTS service)	—

Frontend

Tool	Description	Access
Next.js	Web app (UI & client-side logic)	http://localhost:3000

API Reference

For more details: API reference doc

1. Auth Service

Runs on http://localhost:3001. Handles user registration, login, session management, and API key issuance.

Supports two authentication methods:

JWT Auth — short-lived tokens (50-minute expiry) issued on login. The full key is returned once and only its hash is persisted in the database. Suitable for interactive, session-based usage.
API Key Auth — long-lived keys for programmatic or machine-to-machine access. Keys are generated on demand and stored as hashes.

Endpoints overview:

Method	Route	Description
`POST`	`/auth/register`	Register a new user account
`POST`	`/auth/login`	Authenticate and receive a JWT token
`POST`	`/auth/logout`	Invalidate the current session
`POST`	`/auth/refresh`	Refresh an expired JWT token
`GET`	`/auth/me`	Retrieve the currently authenticated user's profile
`POST`	`/auth/api-keys`	Generate a new long-lived API key
`GET`	`/auth/api-keys`	List all API keys for the current user
`DELETE`	`/auth/api-keys/:id`	Revoke an existing API key

2. Voice Service

Runs on http://localhost:3002. Manages the catalog of available TTS voices, their configurations, and associated metadata. Provides endpoints to browse, filter, and retrieve voice details used when submitting TTS jobs.

Endpoints overview:

Method	Route	Description
`GET`	`/voices`	List all available voices (with optional filters: language, gender, engine)
`GET`	`/voices/:id`	Retrieve details and configuration for a specific voice
`POST`	`/voices`	Register a new custom voice (admin)
`PUT`	`/voices/:id`	Update voice metadata or configuration (admin)
`DELETE`	`/voices/:id`	Remove a voice from the catalog (admin)

3. Worker Service (Async TTS Processing)

Runs on http://localhost:3003. Manages the lifecycle of TTS generation jobs — from submission to completion. Because audio generation can be time-intensive, all processing is handled asynchronously via a RabbitMQ queue.

Job lifecycle:

A user submits text via the API → a job record is created in PostgreSQL with status pending and pushed onto the RabbitMQ queue.
The worker consumer picks up the job and forwards it to the TTS engine service for audio generation.
The generated audio file is uploaded to MinIO (S3-compatible storage).
The job record is updated in the database with status completed and a reference to the stored audio file.
The client can poll the job status endpoint or subscribe to notifications to retrieve the result.

Endpoints overview:

Method	Route	Description
`POST`	`/jobs`	Submit a new TTS job (text, voice ID, output format)
`GET`	`/jobs`	List all jobs for the authenticated user
`GET`	`/jobs/:id`	Get the status and result of a specific job
`DELETE`	`/jobs/:id`	Cancel a pending job or delete a completed one
`GET`	`/jobs/:id/audio`	Download or stream the generated audio file

4. TTS Engine Service

Runs on http://localhost:8000 (Python / FastAPI). The core audio generation engine, responsible for converting text to speech using the configured TTS model. This service is consumed internally by the worker and is not exposed directly to end users.

Kokoro (active)

Type: Text-to-speech
Model: Kokoro-82M — a lightweight open-source TTS model with only 82M parameters
Hardware: CPU-compatible (no GPU required)
Voices: 25 built-in voices across American and British English
Backend: PyTorch

Endpoints overview:

Method	Route	Description
`POST`	`/tts/generate`	Generate audio from text using a specified voice and engine
`GET`	`/tts/voices`	List voices supported by the active TTS engine
`GET`	`/tts/health`	Health check and model readiness status

XTTS (planned)

Will be used for voice cloning — generate speech that mimics a target voice from an audio sample.

⚙️ Setup :

This project uses pnpm as the package manager. Make sure Node.js and npm are installed, then install pnpm globally:

npm install -g pnpm

🚀 Running the App :

For more info about scripts, see package.json.

Backend infrastructure

Build the TTS service image first:

pnpm tts:build   # docker compose up --build -d tts

Start all background services (PostgreSQL, Redis, RabbitMQ, MinIO, Traefik, TTS):

pnpm infra        # Start all background services
pnpm infra:down   # Stop all services
pnpm infra:ps     # Check service status

Microservices

Auth service (port 3001)

pnpm dev:auth

Voice service (port 3002)

pnpm dev:voice

Worker service (port 3003)

pnpm dev:worker

TTS engine service (port 8000)

After running pnpm infra, build and start the TTS Docker container:

docker compose build tts
docker compose up tts

Frontend (port 3000)

pnpm dev:web

License :

This project is licensed under the MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
apps		apps
docs		docs
packages/db		packages/db
services		services
.env.example		.env.example
.gitignore		.gitignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.base.json		tsconfig.base.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ TTS Playground — Text-to-Speech Microservice App

🎬 Demo :

📋 Table of Contents

📂 Project Architecture :

🛠️ Technical Architecture :

Storage & ORM

Messaging & Caching

Infrastructure

Backend

Frontend

API Reference

1. Auth Service

2. Voice Service

3. Worker Service (Async TTS Processing)

4. TTS Engine Service

Kokoro (active)

XTTS (planned)

⚙️ Setup :

🚀 Running the App :

Backend infrastructure

Microservices

License :

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ TTS Playground — Text-to-Speech Microservice App

🎬 Demo :

📋 Table of Contents

📂 Project Architecture :

🛠️ Technical Architecture :

Storage & ORM

Messaging & Caching

Infrastructure

Backend

Frontend

API Reference

1. Auth Service

2. Voice Service

3. Worker Service (Async TTS Processing)

4. TTS Engine Service

Kokoro (active)

XTTS (planned)

⚙️ Setup :

🚀 Running the App :

Backend infrastructure

Microservices

License :

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages