PDF to Speech (PTS)

This project converts PDF documents into speech audio files using a Text-to-Speech (TTS) model. It extracts text from a PDF, splits it into manageable chunks, and synthesizes each chunk into an audio file using the Coqui TTS library.

Features

Extracts and cleans text from PDF files (page by page)
Supports chunking of text for long documents
Synthesizes audio using customizable TTS models (including language and speaker options)
Outputs audio files (WAV) for each chunk/page

Requirements

Python 3.8+
Coqui TTS
pdfminer.six
torch

Installation

Clone this repository:

git clone https://github.com/mtb0x1/pdf_to_speech
cd pdf_to_speech

Install dependencies:
```
pip install TTS==0.13.0 pdfminer.six==20221105 torch==2.0.1
```
You may need to install additional dependencies for your TTS model (see Coqui TTS documentation). For GPU acceleration, install the CUDA-enabled version of torch.

Usage

Run the script from the command line:

python process.py --pdf <path_to_pdf> [--model <tts_model>] [--num-pages <N>] [--language <lang>] [--speaker <name>] [--speaker-wav <wav_path>] [--log-level <level>]

Arguments

--pdf (required): Path to the PDF file to convert.
--model: TTS model to use (default: tts_models/fr/css10/vits).
--num-pages: Number of pages to process (default: all pages).
--language: Language code for TTS synthesis (e.g., fr-fr).
--speaker: Speaker name (if supported by the model).
--speaker-wav: Path to a reference speaker WAV file (if supported).
--log-level: Logging level (debug, info, warning, error).

Example

Convert the first 5 pages of a French PDF to audio using the tts_models/fr/css10/vits model:

python process.py --pdf file.pdf --model tts_models/fr/css10/vits --num-pages 5

The resulting audio files will be saved in the output/ directory as page_001.wav, page_002.wav, etc.

Notes

For best results, use a TTS model that matches the language of your PDF.
Some models support multiple speakers or require additional arguments (see Coqui TTS documentation).
If you encounter CUDA errors, ensure your system has a compatible GPU and CUDA drivers, or the script will fall back to CPU.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.devcontainer		.devcontainer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
process.py		process.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF to Speech (PTS)

Features

Requirements

Installation

Usage

Arguments

Example

Notes

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

mtb0x1/pdf_to_speech

Folders and files

Latest commit

History

Repository files navigation

PDF to Speech (PTS)

Features

Requirements

Installation

Usage

Arguments

Example

Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages