Skip to content

Evolve raw video audio into perfect WAV samples for TTS. This project provides the tools to download, visually segment, and clean audio, transforming source material into high-quality, normalized samples for your Text-to-Speech projects.

Notifications You must be signed in to change notification settings

Nicolas-Prevot/AudioSampleForge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎵 AudioSampleForge

AudioSampleForge is a Python-based audio processing toolchain that allows you to download, extract, clean, and edit audio samples through a simple command-line interface and an intuitive waveform editor.


🚀 Features

  • Download audio from YouTube
  • Extract and normalize audio segments
  • Clean raw samples using deep learning models
  • Visually edit waveform segments with a modern UI
  • Export curated samples ready for use

🖥️ Editor Preview

The waveform editor allows you to select, move, delete, or insert silences between segments. It provides real-time playback and duration control for fine-tuning your samples.

Below is a snapshot of the editor, launched via the audiosampleforge-serve command:

Waveform Editor UI


📦 Installation

To install the project locally using uv:

uv sync
uv tool install . -e

⚙️ Usage

Here's the basic workflow:

1. Download Audio

uv run audiosampleforge-download --url https://www.youtube.com/watch?v=6VAF1YThcbc --out data/0_raw_audio

2. Extract Audio Segment

uv run audiosampleforge-extract --input data/0_raw_audio/segment.wav --start 0.0 --dur 10000.0 --out data/1_normalized_audio/segment.wav

3. Launch Waveform Editor

uv run audiosampleforge-serve --input data/1_normalized_audio/segment.wav

4. Clean Sample

uv run audiosampleforge-clean --input data/2_raw_sample/segment.wav --out data/3_clean_sample/segment.wav

5. Super-Resolution (Optional)

uv run audiosampleforge-cleansr --model speech --input data/output/clean.wav --out data/output/clean2.wav

📁 Project Structure

data/
├── 0_raw_audio/
├── 1_normalized_audio/
├── 2_raw_sample/
├── 3_clean_sample/
└── output/

🤝 Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change or improve.


📄 License

MIT

About

Evolve raw video audio into perfect WAV samples for TTS. This project provides the tools to download, visually segment, and clean audio, transforming source material into high-quality, normalized samples for your Text-to-Speech projects.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages