Audio Summarizer

Here is the README for the audioSummarizer repository based on the provided code structure:

Audio Summarizer

The Audio Summarizer package converts PDF text into a podcast-ready audio format. This package extracts text from PDF files, processes the content using language models, and generates high-quality, engaging audio using Text-to-Speech (TTS) models.

Features

PDF Text Extraction: Extracts text from PDF files using the pdfplumber library.
Text Cleaning and Processing: Cleans and processes extracted text to make it more suitable for podcast narration.
Podcast Script Generation: Uses Hugging Face and Ollama models to rewrite the extracted text into a conversational podcast script.
Audio Generation: Converts the podcast script into audio using the SpeechT5 TTS model.

Installation

To use the Audio Summarizer package, ensure you have Python 3.7+ and pip installed. Then, install the required dependencies:

pip install -r requirements.txt

Usage

To generate an audio podcast from a PDF document, run the following command:

python completed.py <pdf_path> [options]

Arguments

pdf_path: The path to the PDF file you want to convert to audio.
--modelOllama: Specify the Ollama model to use for text processing (default: llama3.2).
--modelHugging: Specify the Hugging Face model for text generation (default: facebook/opt-125m).
--max_tokens: Set the maximum number of tokens for the generation (default: 8126).
--temperature: Set the sampling temperature for text generation (default: 1.0).
--pdf_text_file: The filename to save extracted PDF text (default: pdf_text.txt).
--cleaned_text_file: The filename to save cleaned text (default: cleaned.txt).
--audiomodelname: The TTS model to use for generating the audio (default: suno/bark-small).

Example

python completed.py my_document.pdf --modelOllama llama3.2 --modelHugging facebook/opt-125m --audiomodelname suno/bark-small

This will process the PDF my_document.pdf, clean and rewrite the text, and generate audio output in the output_audio directory.

Directory Structure

./
├── AudioGenerator.py      # Handles audio generation using TTS models
├── completed.py           # Main script for PDF processing and audio generation
├── FileManager.py         # Manages file reading, cleaning, and GPU memory management
└── QueryHandler.py        # Handles text processing, chunking, and interaction with Hugging Face and Ollama models

Detailed Workflow

PDF Pre-Processing: Extract text from the PDF using the FileManager.read_pdf method.
Text Cleaning and Script Generation: The TextProcessor.clean_text method cleans the extracted text and generates a conversational podcast script using Hugging Face and Ollama models.
Audio Generation: The AudioGenerator.generate_audio_from_text method splits the text into chunks and generates audio using the SpeechT5 TTS model.

Dependencies

torch: Required for deep learning models and TTS pipeline.
transformers: Hugging Face library for using pre-trained models.
datasets: Hugging Face dataset library for loading speaker embeddings.
soundfile: For saving the generated audio to files.
pydub: Used for audio manipulation (if necessary).
pdfplumber: Used to extract text from PDF files.

Install Dependencies

You can install the dependencies with the following command:

pip install -r requirements.txt

License

This repository is licensed under the MIT License. See the LICENSE file for more details.

Feel free to customize the README further if needed!

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
__pycache__		__pycache__
output_audio		output_audio
resources		resources
AudioGenerator.py		AudioGenerator.py
FileManager.py		FileManager.py
QueryHandler.py		QueryHandler.py
README.md		README.md
completed.py		completed.py
pdf_text.txt		pdf_text.txt
podcast_script_part_1.txt		podcast_script_part_1.txt
podcast_script_part_10.txt		podcast_script_part_10.txt
podcast_script_part_11.txt		podcast_script_part_11.txt
podcast_script_part_2.txt		podcast_script_part_2.txt
podcast_script_part_3.txt		podcast_script_part_3.txt
podcast_script_part_4.txt		podcast_script_part_4.txt
podcast_script_part_5.txt		podcast_script_part_5.txt
podcast_script_part_6.txt		podcast_script_part_6.txt
podcast_script_part_7.txt		podcast_script_part_7.txt
podcast_script_part_8.txt		podcast_script_part_8.txt
podcast_script_part_9.txt		podcast_script_part_9.txt
requirments.txt		requirments.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio Summarizer

Features

Table of Contents

Installation

Usage

Arguments

Example

Directory Structure

Detailed Workflow

Dependencies

Install Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

jamilgafur/audioSummarizer

Folders and files

Latest commit

History

Repository files navigation

Audio Summarizer

Features

Table of Contents

Installation

Usage

Arguments

Example

Directory Structure

Detailed Workflow

Dependencies

Install Dependencies

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages