Skip to content

aman179102/podvoice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

🧠 Podvoice

Podvoice is a local-first AI podcast generator that converts simple Markdown scripts into multi-speaker audio.

Originally built as a CLI tool, Podvoice now includes PodVoice Studio β€” a modern web-based GUI for creating, previewing, and generating AI audio visually.

No cloud APIs. No subscriptions. Fully offline.

Runs on Linux, Windows, macOS, and FreeBSD.


Why Podvoice?

Most AI audio tools:

  • Require paid APIs
  • Depend on cloud services

Podvoice is:

  • Local-first
  • Fully offline
  • Developer-friendly
  • Now with a visual GUI (PodVoice Studio)

Features

  • Markdown-based scripts
  • Multiple logical speakers
  • Deterministic voice assignment
  • Single stitched output file
  • WAV or MP3 export
  • Local-only inference
  • CPU-first (GPU optional)
  • Cross-platform support
  • πŸŽ™οΈ Studio Web UI β€” Modern single-page interface for voice selection, preview, and generation
  • πŸ”Š Built-in multi-speaker models β€” VCTK vits and others with cached voice demos
  • ⚑ AJAX-based generation β€” No page reloads, instant audio playback
  • 🎨 Modern dark theme β€” Clean sidebar layout with zero scrolling
  • πŸ“ Profile management β€” YAML-based speaker profiles with reference audio support
  • πŸ”„ Multi-reference audio β€” Concatenate multiple clips for better voice conditioning

Supported platforms

Platform Status Notes
Linux βœ… Fully supported Primary dev platform
macOS βœ… Fully supported Intel + Apple Silicon
Windows βœ… Fully supported PowerShell
FreeBSD βœ… Supported Requires ffmpeg
WSL2 βœ… Supported Recommended on Windows

Input format

Podvoice consumes Markdown files with speaker blocks:

[Host | calm]
Welcome to the show.

[Guest | warm]
If this sounds useful, try writing your own script
and see how easily Markdown becomes audio.

Rules:

  • Speaker name is required
  • Emotion tag is optional
  • Text continues until the next speaker block
  • Blank lines are allowed

▢️ Demo Video of Podvoice Studio (GUI USAGE)

gui_demo.mp4

🎧 Demo Audio

Demo-Audio.mp4

▢️ Demo Video of Podvoice (CLI USAGE)

Demo-Video.mp4
---

Quick start (ALL operating systems)

1️⃣ System requirements (common)

Required everywhere:

  • Python 3.10.x
  • ffmpeg
  • espeak or espeak-ng (required for Studio with built-in multi-speaker models)
  • Internet access only for first run
  • ~5–8 GB free disk space (model cache)

2️⃣ Install system dependencies

🐧 Linux (Ubuntu / Debian)

sudo apt update
sudo apt install -y python3.10 python3.10-venv ffmpeg git espeak

🍎 macOS (Homebrew)

brew install python@3.10 ffmpeg git

πŸͺŸ Windows (PowerShell)

winget install Python.Python.3.10
winget install ffmpeg
winget install Git.Git

Restart the terminal after installing Python.


🐑 FreeBSD

pkg install python310 ffmpeg git

3️⃣ Clone the repository

git clone https://github.com/aman179102/podvoice.git
cd podvoice

Setup (recommended path)

🐧 Linux / 🍎 macOS / 🐑 FreeBSD

chmod +x bootstrap.sh
./bootstrap.sh

This script will:

  • Verify Python 3.10
  • Create a local .venv
  • Install fully pinned dependencies from requirements.lock
  • Install podvoice in editable mode

πŸͺŸ Windows (PowerShell)

One-time: allow local scripts

Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned

Run bootstrap

.\bootstrap.ps1

Activate the environment

Linux / macOS / FreeBSD

source .venv/bin/activate

Windows

.venv\Scripts\Activate.ps1

Run the demo

podvoice examples/demo.md --out demo.wav

Or export MP3:

podvoice examples/demo.md --out demo.mp3

On first run, Coqui XTTS v2 model weights will be downloaded and cached locally. Subsequent runs reuse the cache.


πŸŽ™οΈ Studio Web UI

Podvoice includes a modern, single-page web interface for interactive voice generation.

Launch Studio

podvoice studio --host 127.0.0.1 --port 8000

Then open: http://127.0.0.1:8000

Studio Features

Feature Description
Sidebar Voice Gallery All built-in speakers displayed with human-friendly labels
Instant Preview Click any voice to hear a demo instantly (cached after first play)
Single TTS Type text, select voice, generate audio β€” no page reloads
Multi TTS (Podcast) Paste Markdown scripts with speaker mapping
AJAX Generation Audio generates and plays without leaving the page
Modern Dark Theme Clean aesthetic with CSS variables, no scrolling

Studio Endpoints

  • / or /single β€” Single TTS page
  • /multi β€” Multi-speaker podcast page
  • /demo_wav?voice=p240 β€” Get cached demo audio for a voice
  • /health β€” Health check endpoint

Using a Different Model

Studio defaults to tts_models/en/vctk/vits (built-in multi-speaker). To use XTTS v2 instead:

podvoice studio --model-name tts_models/multilingual/multi-dataset/xtts_v2

CLI usage

podvoice SCRIPT.md --out OUTPUT

Examples:

podvoice examples/demo.md --out output.wav
podvoice examples/demo.md --out podcast.mp3 --language en --device cpu

Options

Option Description
SCRIPT Input Markdown file
--out, -o Output .wav or .mp3
--language, -l XTTS language code
--device, -d cpu (default) or cuda

GPU usage (optional)

If you have a compatible NVIDIA GPU:

podvoice examples/demo.md --device cuda

If CUDA is unavailable, Podvoice safely falls back to CPU.


πŸ“ Profile Management

Podvoice supports YAML-based speaker profiles for advanced use cases.

Profile Directory

Default: ./podvoice_profiles/profiles.yaml

Profile Format

profiles:
  my_custom_voice:
    builtin_speaker: p240
  cloned_voice:
    reference_audio: ./samples/voice.wav
  multi_sample_voice:
    reference_audios:
      - ./samples/clip1.wav
      - ./samples/clip2.wav
      - ./samples/clip3.wav

Using Profiles

Profiles are automatically loaded and can be referenced in your Markdown scripts by speaker name.


Performance notes

You may see warnings like:

Could not initialize NNPACK! Reason: Unsupported hardware.

βœ”οΈ These are harmless βœ”οΈ Audio generation will still complete ❌ No action required


How voices are assigned

Podvoice does not train voices.

Instead:

  • Uses built-in XTTS v2 speakers
  • Hashes speaker names deterministically
  • Maps each logical speaker to a stable voice

Implications:

  • Same speaker name β†’ same voice
  • Rename speaker β†’ possibly different voice
  • XTTS update β†’ mapping may change

Fallback: default XTTS voice.


Project structure

podvoice/
β”œβ”€β”€ podvoice/
β”‚   β”œβ”€β”€ cli.py            # CLI entrypoint
β”‚   β”œβ”€β”€ parser.py         # Markdown parser
β”‚   β”œβ”€β”€ tts.py            # XTTS inference
β”‚   β”œβ”€β”€ audio.py          # Audio stitching
β”‚   β”œβ”€β”€ studio.py         # FastAPI web UI
β”‚   β”œβ”€β”€ profiles.py       # YAML profile management
β”‚   β”œβ”€β”€ preprocessing.py  # Audio preprocessing
β”‚   └── utils.py
β”‚
β”œβ”€β”€ examples/
β”‚   └── demo.md
β”‚
β”œβ”€β”€ podvoice_profiles/    # Voice profiles directory
β”‚
β”œβ”€β”€ bootstrap.sh
β”œβ”€β”€ bootstrap.ps1
β”œβ”€β”€ pyproject.toml
└── README.md

Responsible use

Podvoice generates natural-sounding speech.

Do not:

  • Impersonate real people without consent
  • Use generated audio for fraud or deception

Always disclose synthesized content where appropriate.

You are responsible for compliance with all applicable laws and licenses, including those of Coqui XTTS v2.


Contributing

Podvoice is intentionally simple.

Good contributions:

  • Bug reports with minimal reproduction scripts
  • CLI UX improvements
  • Documentation clarity
  • Cross-platform fixes

Non-goals:

  • Cloud dependencies
  • Training pipelines
  • Over-engineering

Goal: local, boring, reliable software.