🧠 Podvoice

Podvoice is a local-first AI podcast generator that converts simple Markdown scripts into multi-speaker audio.

Originally built as a CLI tool, Podvoice now includes PodVoice Studio — a modern web-based GUI for creating, previewing, and generating AI audio visually.

No cloud APIs. No subscriptions. Fully offline.

Runs on Linux, Windows, macOS, and FreeBSD.

Why Podvoice?

Most AI audio tools:

Require paid APIs
Depend on cloud services

Podvoice is:

Local-first
Fully offline
Developer-friendly
Now with a visual GUI (PodVoice Studio)

Features

Markdown-based scripts
Multiple logical speakers
Deterministic voice assignment
Single stitched output file
WAV or MP3 export
Local-only inference
CPU-first (GPU optional)
Cross-platform support
🎙️ Studio Web UI — Modern single-page interface for voice selection, preview, and generation
🔊 Built-in multi-speaker models — VCTK vits and others with cached voice demos
⚡ AJAX-based generation — No page reloads, instant audio playback
🎨 Modern dark theme — Clean sidebar layout with zero scrolling
📁 Profile management — YAML-based speaker profiles with reference audio support
🔄 Multi-reference audio — Concatenate multiple clips for better voice conditioning

Supported platforms

Platform	Status	Notes
Linux	✅ Fully supported	Primary dev platform
macOS	✅ Fully supported	Intel + Apple Silicon
Windows	✅ Fully supported	PowerShell
FreeBSD	✅ Supported	Requires ffmpeg
WSL2	✅ Supported	Recommended on Windows

Input format

Podvoice consumes Markdown files with speaker blocks:

[Host | calm]
Welcome to the show.

[Guest | warm]
If this sounds useful, try writing your own script
and see how easily Markdown becomes audio.

Rules:

Speaker name is required
Emotion tag is optional
Text continues until the next speaker block
Blank lines are allowed

▶️ Demo Video of Podvoice Studio (GUI USAGE)

gui_demo.mp4

🎧 Demo Audio

Demo-Audio.mp4

▶️ Demo Video of Podvoice (CLI USAGE)

Demo-Video.mp4

---

Quick start (ALL operating systems)

1️⃣ System requirements (common)

Required everywhere:

Python 3.10.x
ffmpeg
espeak or espeak-ng (required for Studio with built-in multi-speaker models)
Internet access only for first run
~5–8 GB free disk space (model cache)

2️⃣ Install system dependencies

🐧 Linux (Ubuntu / Debian)

sudo apt update
sudo apt install -y python3.10 python3.10-venv ffmpeg git espeak

🍎 macOS (Homebrew)

brew install python@3.10 ffmpeg git

🪟 Windows (PowerShell)

winget install Python.Python.3.10
winget install ffmpeg
winget install Git.Git

Restart the terminal after installing Python.

🐡 FreeBSD

pkg install python310 ffmpeg git

3️⃣ Clone the repository

git clone https://github.com/aman179102/podvoice.git
cd podvoice

Setup (recommended path)

🐧 Linux / 🍎 macOS / 🐡 FreeBSD

chmod +x bootstrap.sh
./bootstrap.sh

This script will:

Verify Python 3.10
Create a local .venv
Install fully pinned dependencies from requirements.lock
Install podvoice in editable mode

🪟 Windows (PowerShell)

One-time: allow local scripts

Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned

Run bootstrap

.\bootstrap.ps1

Activate the environment

Linux / macOS / FreeBSD

source .venv/bin/activate

Windows

.venv\Scripts\Activate.ps1

Run the demo

podvoice examples/demo.md --out demo.wav

Or export MP3:

podvoice examples/demo.md --out demo.mp3

On first run, Coqui XTTS v2 model weights will be downloaded and cached locally. Subsequent runs reuse the cache.

🎙️ Studio Web UI

Podvoice includes a modern, single-page web interface for interactive voice generation.

Launch Studio

podvoice studio --host 127.0.0.1 --port 8000

Then open: http://127.0.0.1:8000

Studio Features

Feature	Description
Sidebar Voice Gallery	All built-in speakers displayed with human-friendly labels
Instant Preview	Click any voice to hear a demo instantly (cached after first play)
Single TTS	Type text, select voice, generate audio — no page reloads
Multi TTS (Podcast)	Paste Markdown scripts with speaker mapping
AJAX Generation	Audio generates and plays without leaving the page
Modern Dark Theme	Clean aesthetic with CSS variables, no scrolling

Studio Endpoints

/ or /single — Single TTS page
/multi — Multi-speaker podcast page
/demo_wav?voice=p240 — Get cached demo audio for a voice
/health — Health check endpoint

Using a Different Model

Studio defaults to tts_models/en/vctk/vits (built-in multi-speaker). To use XTTS v2 instead:

podvoice studio --model-name tts_models/multilingual/multi-dataset/xtts_v2

CLI usage

podvoice SCRIPT.md --out OUTPUT

Examples:

podvoice examples/demo.md --out output.wav

podvoice examples/demo.md --out podcast.mp3 --language en --device cpu

Options

Option	Description
`SCRIPT`	Input Markdown file
`--out`, `-o`	Output `.wav` or `.mp3`
`--language`, `-l`	XTTS language code
`--device`, `-d`	`cpu` (default) or `cuda`

GPU usage (optional)

If you have a compatible NVIDIA GPU:

podvoice examples/demo.md --device cuda

If CUDA is unavailable, Podvoice safely falls back to CPU.

📁 Profile Management

Podvoice supports YAML-based speaker profiles for advanced use cases.

Profile Directory

Default: ./podvoice_profiles/profiles.yaml

Profile Format

profiles:
  my_custom_voice:
    builtin_speaker: p240
  cloned_voice:
    reference_audio: ./samples/voice.wav
  multi_sample_voice:
    reference_audios:
      - ./samples/clip1.wav
      - ./samples/clip2.wav
      - ./samples/clip3.wav

Using Profiles

Profiles are automatically loaded and can be referenced in your Markdown scripts by speaker name.

Performance notes

You may see warnings like:

Could not initialize NNPACK! Reason: Unsupported hardware.

✔️ These are harmless ✔️ Audio generation will still complete ❌ No action required

How voices are assigned

Podvoice does not train voices.

Instead:

Uses built-in XTTS v2 speakers
Hashes speaker names deterministically
Maps each logical speaker to a stable voice

Implications:

Same speaker name → same voice
Rename speaker → possibly different voice
XTTS update → mapping may change

Fallback: default XTTS voice.

Project structure

podvoice/
├── podvoice/
│   ├── cli.py            # CLI entrypoint
│   ├── parser.py         # Markdown parser
│   ├── tts.py            # XTTS inference
│   ├── audio.py          # Audio stitching
│   ├── studio.py         # FastAPI web UI
│   ├── profiles.py       # YAML profile management
│   ├── preprocessing.py  # Audio preprocessing
│   └── utils.py
│
├── examples/
│   └── demo.md
│
├── podvoice_profiles/    # Voice profiles directory
│
├── bootstrap.sh
├── bootstrap.ps1
├── pyproject.toml
└── README.md

Responsible use

Podvoice generates natural-sounding speech.

Do not:

Impersonate real people without consent
Use generated audio for fraud or deception

Always disclose synthesized content where appropriate.

You are responsible for compliance with all applicable laws and licenses, including those of Coqui XTTS v2.

Contributing

Podvoice is intentionally simple.

Good contributions:

Bug reports with minimal reproduction scripts
CLI UX improvements
Documentation clarity
Cross-platform fixes

Non-goals:

Cloud dependencies
Training pipelines
Over-engineering

Goal: local, boring, reliable software.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
examples		examples
podvoice		podvoice
.gitignore		.gitignore
.python-version		.python-version
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
bootstrap.ps1		bootstrap.ps1
bootstrap.sh		bootstrap.sh
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

🧠 Podvoice

Why Podvoice?

Features

Supported platforms

Input format

▶️ Demo Video of Podvoice Studio (GUI USAGE)

🎧 Demo Audio

▶️ Demo Video of Podvoice (CLI USAGE)

Quick start (ALL operating systems)

1️⃣ System requirements (common)

2️⃣ Install system dependencies

🐧 Linux (Ubuntu / Debian)

🍎 macOS (Homebrew)

🪟 Windows (PowerShell)

🐡 FreeBSD

3️⃣ Clone the repository

Setup (recommended path)

🐧 Linux / 🍎 macOS / 🐡 FreeBSD

🪟 Windows (PowerShell)

One-time: allow local scripts

Run bootstrap

Activate the environment

Linux / macOS / FreeBSD

Windows

Run the demo

🎙️ Studio Web UI

Launch Studio

Studio Features

Studio Endpoints

Using a Different Model

CLI usage

Options

GPU usage (optional)

📁 Profile Management

Profile Directory

Profile Format

Using Profiles

Performance notes

How voices are assigned

Project structure

Responsible use

Contributing

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Uh oh!

Contributors

Uh oh!

Languages