Skip to content

bensbehChaimae/TTS_playground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

68 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ TTS Playground β€” Text-to-Speech Microservice App

TTS Playground is a full-stack text-to-speech app built using a microservices architecture. It lets users enter text, pick from over 25 different voices, and get back an audio file β€” all handled asynchronously in the background so the app stays fast and responsive.

Behind the scenes, the system is split into multiple services, each with a clear role. A Fastify-based API handles incoming requests, while a Python (FastAPI) service takes care of generating the audio. Data is stored in PostgreSQL, Redis is used for caching and rate limiting, and RabbitMQ manages background jobs.

The goal of this project was to experiment with real-world backend patterns like service separation, async processing, and scalable architecture β€” while also building something practical and usable.

🎬 Demo :

πŸ“Έ Screenshots / screen recording coming soon

Demo

πŸ“‹ Table of Contents

πŸ“‚ Project Architecture :

root/
β”‚
β”œβ”€β”€ apps/
β”‚   └── web/                     # Frontend app (Next.js - UI, client-side logic)
β”‚
β”œβ”€β”€ packages/
β”‚   └── db/                      # Database layer (Drizzle ORM schemas, migrations)
β”‚
β”œβ”€β”€ services/
β”‚   β”œβ”€β”€ auth/                    # Authentication service (JWT, sessions, user auth)
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”‚   β”œβ”€β”€ modules/         # Feature-based structure (auth logic, user module, etc.)
β”‚   β”‚   β”‚   β”œβ”€β”€ plugins/         # Fastify plugins (JWT, cookies, hooks)
β”‚   β”‚   β”‚   β”œβ”€β”€ utils/           # Service-specific helpers
β”‚   β”‚   β”‚   └── server.ts        # Entry point (Fastify app setup)
β”‚   β”‚   β”œβ”€β”€ package.json         # Service dependencies & scripts
β”‚   β”‚   └── tsconfig.json        # TypeScript config (extends root config)                
β”‚   β”‚
β”‚   β”œβ”€β”€ gateway/                 # API Gateway (entry point, routing, aggregation)
β”‚   β”‚    └── ...
β”‚   β”œβ”€β”€ tts/                     # Text-to-Speech engine (Python, audio generation)
β”‚   β”‚    └── ...
β”‚   β”œβ”€β”€ voice/                   # Voice management (voices, configs, metadata)
β”‚   β”‚    └── ...
β”‚   └── worker/                  # Background worker (async jobs, queues, TTS processing)
β”‚        └── ...
β”‚
β”œβ”€β”€ .env                         # Environment variables (secrets, config)
β”œβ”€β”€ pnpm-workspace.yaml          # Defines monorepo structure (apps, packages, services)
β”œβ”€β”€ pnpm-lock.yaml               # Lockfile (ensures consistent dependency versions)
β”œβ”€β”€ tsconfig.json                # Base TypeScript config (shared across all projects)
β”œβ”€β”€ docker-compose.yaml          # Runs all services together (dev / local orchestration)
└── package.json                 # Root config (scripts, workspace settings)

πŸ› οΈ Technical Architecture :

Demo

Storage & ORM

Tool Description Access
PostgreSQL Primary relational database β€”
MinIO S3-compatible object storage (audio files) http://localhost:9000
Drizzle ORM Type-safe ORM & migrations https://local.drizzle.studio (pnpm db:studio)

Messaging & Caching

Tool Description Access
Redis In-memory caching (sessions, rate limiting) β€”
RabbitMQ Async job queues & message broker http://localhost:15672

Infrastructure

Tool Description Access
Traefik API Gateway, reverse proxy, load balancer http://localhost:8080
Docker Containerization & orchestration β€”

Backend

Tool Description Access
Fastify Backend framework (auth, gateway, voice, worker) β€”
FastAPI Python framework (TTS service) β€”

Frontend

Tool Description Access
Next.js Web app (UI & client-side logic) http://localhost:3000

API Reference

Demo

For more details: API reference doc

1. Auth Service

Runs on http://localhost:3001. Handles user registration, login, session management, and API key issuance.

Supports two authentication methods:

  • JWT Auth β€” short-lived tokens (50-minute expiry) issued on login. The full key is returned once and only its hash is persisted in the database. Suitable for interactive, session-based usage.
  • API Key Auth β€” long-lived keys for programmatic or machine-to-machine access. Keys are generated on demand and stored as hashes.

Endpoints overview:

Method Route Description
POST /auth/register Register a new user account
POST /auth/login Authenticate and receive a JWT token
POST /auth/logout Invalidate the current session
POST /auth/refresh Refresh an expired JWT token
GET /auth/me Retrieve the currently authenticated user's profile
POST /auth/api-keys Generate a new long-lived API key
GET /auth/api-keys List all API keys for the current user
DELETE /auth/api-keys/:id Revoke an existing API key

2. Voice Service

Runs on http://localhost:3002. Manages the catalog of available TTS voices, their configurations, and associated metadata. Provides endpoints to browse, filter, and retrieve voice details used when submitting TTS jobs.

Endpoints overview:

Method Route Description
GET /voices List all available voices (with optional filters: language, gender, engine)
GET /voices/:id Retrieve details and configuration for a specific voice
POST /voices Register a new custom voice (admin)
PUT /voices/:id Update voice metadata or configuration (admin)
DELETE /voices/:id Remove a voice from the catalog (admin)

3. Worker Service (Async TTS Processing)

Runs on http://localhost:3003. Manages the lifecycle of TTS generation jobs β€” from submission to completion. Because audio generation can be time-intensive, all processing is handled asynchronously via a RabbitMQ queue.

Job lifecycle:

  1. A user submits text via the API β†’ a job record is created in PostgreSQL with status pending and pushed onto the RabbitMQ queue.
  2. The worker consumer picks up the job and forwards it to the TTS engine service for audio generation.
  3. The generated audio file is uploaded to MinIO (S3-compatible storage).
  4. The job record is updated in the database with status completed and a reference to the stored audio file.
  5. The client can poll the job status endpoint or subscribe to notifications to retrieve the result.

Endpoints overview:

Method Route Description
POST /jobs Submit a new TTS job (text, voice ID, output format)
GET /jobs List all jobs for the authenticated user
GET /jobs/:id Get the status and result of a specific job
DELETE /jobs/:id Cancel a pending job or delete a completed one
GET /jobs/:id/audio Download or stream the generated audio file

4. TTS Engine Service

Runs on http://localhost:8000 (Python / FastAPI). The core audio generation engine, responsible for converting text to speech using the configured TTS model. This service is consumed internally by the worker and is not exposed directly to end users.

Kokoro (active)

  • Type: Text-to-speech
  • Model: Kokoro-82M β€” a lightweight open-source TTS model with only 82M parameters
  • Hardware: CPU-compatible (no GPU required)
  • Voices: 25 built-in voices across American and British English
  • Backend: PyTorch

Endpoints overview:

Method Route Description
POST /tts/generate Generate audio from text using a specified voice and engine
GET /tts/voices List voices supported by the active TTS engine
GET /tts/health Health check and model readiness status

XTTS (planned)

  • Will be used for voice cloning β€” generate speech that mimics a target voice from an audio sample.

βš™οΈ Setup :

This project uses pnpm as the package manager. Make sure Node.js and npm are installed, then install pnpm globally:

npm install -g pnpm

πŸš€ Running the App :

For more info about scripts, see package.json.

Backend infrastructure

Build the TTS service image first:

pnpm tts:build   # docker compose up --build -d tts

Start all background services (PostgreSQL, Redis, RabbitMQ, MinIO, Traefik, TTS):

pnpm infra        # Start all background services
pnpm infra:down   # Stop all services
pnpm infra:ps     # Check service status

Microservices

  • Auth service (port 3001)
pnpm dev:auth
  • Voice service (port 3002)
pnpm dev:voice
  • Worker service (port 3003)
pnpm dev:worker
  • TTS engine service (port 8000)

After running pnpm infra, build and start the TTS Docker container:

docker compose build tts
docker compose up tts
  • Frontend (port 3000)
pnpm dev:web

License :

This project is licensed under the MIT License

About

Open-source microservice platform for async text-to-speech generation with voice cloning support.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors