Podvoice is a local-first AI podcast generator that converts simple Markdown scripts into multi-speaker audio.
Originally built as a CLI tool, Podvoice now includes PodVoice Studio β a modern web-based GUI for creating, previewing, and generating AI audio visually.
No cloud APIs. No subscriptions. Fully offline.
Runs on Linux, Windows, macOS, and FreeBSD.
Most AI audio tools:
- Require paid APIs
- Depend on cloud services
Podvoice is:
- Local-first
- Fully offline
- Developer-friendly
- Now with a visual GUI (PodVoice Studio)
- Markdown-based scripts
- Multiple logical speakers
- Deterministic voice assignment
- Single stitched output file
- WAV or MP3 export
- Local-only inference
- CPU-first (GPU optional)
- Cross-platform support
- ποΈ Studio Web UI β Modern single-page interface for voice selection, preview, and generation
- π Built-in multi-speaker models β VCTK vits and others with cached voice demos
- β‘ AJAX-based generation β No page reloads, instant audio playback
- π¨ Modern dark theme β Clean sidebar layout with zero scrolling
- π Profile management β YAML-based speaker profiles with reference audio support
- π Multi-reference audio β Concatenate multiple clips for better voice conditioning
| Platform | Status | Notes |
|---|---|---|
| Linux | β Fully supported | Primary dev platform |
| macOS | β Fully supported | Intel + Apple Silicon |
| Windows | β Fully supported | PowerShell |
| FreeBSD | β Supported | Requires ffmpeg |
| WSL2 | β Supported | Recommended on Windows |
Podvoice consumes Markdown files with speaker blocks:
[Host | calm]
Welcome to the show.
[Guest | warm]
If this sounds useful, try writing your own script
and see how easily Markdown becomes audio.Rules:
- Speaker name is required
- Emotion tag is optional
- Text continues until the next speaker block
- Blank lines are allowed
gui_demo.mp4
Demo-Audio.mp4
Demo-Video.mp4
Required everywhere:
- Python 3.10.x
- ffmpeg
- espeak or espeak-ng (required for Studio with built-in multi-speaker models)
- Internet access only for first run
- ~5β8 GB free disk space (model cache)
sudo apt update
sudo apt install -y python3.10 python3.10-venv ffmpeg git espeakbrew install python@3.10 ffmpeg gitwinget install Python.Python.3.10
winget install ffmpeg
winget install Git.GitRestart the terminal after installing Python.
pkg install python310 ffmpeg gitgit clone https://github.com/aman179102/podvoice.git
cd podvoicechmod +x bootstrap.sh
./bootstrap.shThis script will:
- Verify Python 3.10
- Create a local
.venv - Install fully pinned dependencies from
requirements.lock - Install
podvoicein editable mode
Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned.\bootstrap.ps1source .venv/bin/activate.venv\Scripts\Activate.ps1podvoice examples/demo.md --out demo.wavOr export MP3:
podvoice examples/demo.md --out demo.mp3On first run, Coqui XTTS v2 model weights will be downloaded and cached locally. Subsequent runs reuse the cache.
Podvoice includes a modern, single-page web interface for interactive voice generation.
podvoice studio --host 127.0.0.1 --port 8000Then open: http://127.0.0.1:8000
| Feature | Description |
|---|---|
| Sidebar Voice Gallery | All built-in speakers displayed with human-friendly labels |
| Instant Preview | Click any voice to hear a demo instantly (cached after first play) |
| Single TTS | Type text, select voice, generate audio β no page reloads |
| Multi TTS (Podcast) | Paste Markdown scripts with speaker mapping |
| AJAX Generation | Audio generates and plays without leaving the page |
| Modern Dark Theme | Clean aesthetic with CSS variables, no scrolling |
/or/singleβ Single TTS page/multiβ Multi-speaker podcast page/demo_wav?voice=p240β Get cached demo audio for a voice/healthβ Health check endpoint
Studio defaults to tts_models/en/vctk/vits (built-in multi-speaker). To use XTTS v2 instead:
podvoice studio --model-name tts_models/multilingual/multi-dataset/xtts_v2podvoice SCRIPT.md --out OUTPUTExamples:
podvoice examples/demo.md --out output.wavpodvoice examples/demo.md --out podcast.mp3 --language en --device cpu| Option | Description |
|---|---|
SCRIPT |
Input Markdown file |
--out, -o |
Output .wav or .mp3 |
--language, -l |
XTTS language code |
--device, -d |
cpu (default) or cuda |
If you have a compatible NVIDIA GPU:
podvoice examples/demo.md --device cudaIf CUDA is unavailable, Podvoice safely falls back to CPU.
Podvoice supports YAML-based speaker profiles for advanced use cases.
Default: ./podvoice_profiles/profiles.yaml
profiles:
my_custom_voice:
builtin_speaker: p240
cloned_voice:
reference_audio: ./samples/voice.wav
multi_sample_voice:
reference_audios:
- ./samples/clip1.wav
- ./samples/clip2.wav
- ./samples/clip3.wavProfiles are automatically loaded and can be referenced in your Markdown scripts by speaker name.
You may see warnings like:
Could not initialize NNPACK! Reason: Unsupported hardware.
βοΈ These are harmless βοΈ Audio generation will still complete β No action required
Podvoice does not train voices.
Instead:
- Uses built-in XTTS v2 speakers
- Hashes speaker names deterministically
- Maps each logical speaker to a stable voice
Implications:
- Same speaker name β same voice
- Rename speaker β possibly different voice
- XTTS update β mapping may change
Fallback: default XTTS voice.
podvoice/
βββ podvoice/
β βββ cli.py # CLI entrypoint
β βββ parser.py # Markdown parser
β βββ tts.py # XTTS inference
β βββ audio.py # Audio stitching
β βββ studio.py # FastAPI web UI
β βββ profiles.py # YAML profile management
β βββ preprocessing.py # Audio preprocessing
β βββ utils.py
β
βββ examples/
β βββ demo.md
β
βββ podvoice_profiles/ # Voice profiles directory
β
βββ bootstrap.sh
βββ bootstrap.ps1
βββ pyproject.toml
βββ README.md
Podvoice generates natural-sounding speech.
Do not:
- Impersonate real people without consent
- Use generated audio for fraud or deception
Always disclose synthesized content where appropriate.
You are responsible for compliance with all applicable laws and licenses, including those of Coqui XTTS v2.
Podvoice is intentionally simple.
Good contributions:
- Bug reports with minimal reproduction scripts
- CLI UX improvements
- Documentation clarity
- Cross-platform fixes
Non-goals:
- Cloud dependencies
- Training pipelines
- Over-engineering
Goal: local, boring, reliable software.