Advanced Audio Visualizer

A GPU-accelerated Python desktop application for creating dynamic audio visualizations with custom styles and automatic subtitles. The application leverages CUDA acceleration through CuPy for real-time audio processing and PyTorch for GPU-accelerated speech recognition.

Features

Multiple Visualization Types: Select from bars, wave, and spectrum styles.
Frame Rate Control: Adjustable FPS between 1-60.
Customizable Colors: Set distinct visualization and background colors.
Aspect Ratio and Orientation Options: Supports 16:9, 4:3, and 1:1 aspect ratios with horizontal or vertical orientations.
Subtitle Integration: Automatically generates and integrates subtitles using Whisper for real-time transcriptions, synchronized with the video.
Memory-efficient Processing: Utilizes memory mapping to handle large audio files.
Supported Formats: MP3, WAV, M4A, OGG, and FLAC.
GPU Acceleration: CUDA-powered audio processing for faster rendering
Parallel Processing: Utilizes all CPU cores for enhanced performance
Real-time Speech Recognition: GPU-accelerated Whisper model for subtitle generation
Smart Memory Management: Memory mapping and batch processing for large files
Frame Interpolation: Smooth transitions between frames using GPU acceleration
Adaptive Processing: Falls back to CPU when GPU is unavailable

Installation

Prerequisites

Python 3.7 or higher
FFmpeg (for video encoding)
Qt5 libraries (included in PyQt5)
libsndfile (for audio processing)
CUDA (optional, for faster Whisper transcription if using a GPU)

Clone the repository:

git clone https://github.com/Gontary101/audio-visualizer.git
cd audio-visualizer

Set up a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install required packages:
```
pip install -r requirements.txt
```

Install system dependencies (Linux users):

# Ubuntu/Debian
sudo apt-get install python3-qt5 libsndfile1 ffmpeg

# Fedora
sudo dnf install python3-qt5 libsndfile ffmpeg

Additional Notes for Windows Users

Ensure that ImageMagick is installed and MAGICK_BINARY is correctly set in moviepy/config_defaults.py if needed. Install it from ImageMagick’s website and set the path in the code.

1. CUDA Setup (for GPU acceleration)

First, install CUDA Toolkit from NVIDIA's website. Then:

# Verify CUDA installation
nvidia-smi

# Install CuPy for your CUDA version
# For CUDA 11.x:
pip install cupy-cuda11x

2. Install Dependencies

pip install -r requirements.txt

3. Additional System Dependencies

For Ubuntu/Debian:

sudo apt-get update
sudo apt-get install python3-qt5 libsndfile1 ffmpeg nvidia-cuda-toolkit

For Windows:

Install CUDA Toolkit from NVIDIA website
Install ImageMagick and add to PATH
Install Visual C++ Build Tools

Usage

Run the application:
```
python audio_visualizer.py
```
Select an Audio File: Click "Select Audio" to load your audio file.
Configure Visualization Settings:
- Choose visualization type: bars, wave, or spectrum.
- Adjust FPS (1-60).
- Set aspect ratio (16:9, 4:3, or 1:1) and orientation (horizontal or vertical).
- Customize visualization and background colors.
- Adjust amplitude scale using the slider.
- Select the subsampling factor to control signal density.
Generate Video: Click "Generate Video," choose a save location, and wait for processing. The application will automatically generate and add subtitles to the video, synchronized with the audio content.

Troubleshooting

Common Issues

Qt Platform Plugin Error on Linux:

# Install Wayland support:
sudo apt-get install qt5-wayland  # Ubuntu/Debian
sudo dnf install qt5-qtwayland    # Fedora

# Or set the platform:
export QT_QPA_PLATFORM=xcb

Audio File Errors:
- Check file format compatibility and ensure the file isn’t corrupted.
- Confirm audio codecs are installed correctly.
Memory Issues:
- Reduce FPS or try shorter audio files.
- Close other memory-intensive applications during processing.

Performance Tips

Use lower FPS for quicker processing.
“Bars” visualization processes faster than “spectrum.”
Running the application on SSD storage for temporary files can improve speed.

Dependencies

Check requirements.txt for the full list of Python dependencies.

System Requirements

Python: 3.7 or higher.
FFmpeg: Required for video encoding.
RAM: 4GB minimum (8GB recommended for longer files).
Whisper: Optional, requires a compatible GPU for faster transcription.

Contributing

Fork this repository.
Create a new branch (git checkout -b feature/AmazingFeature).
Commit your changes (git commit -m 'Add some AmazingFeature').
Push the branch (git push origin feature/AmazingFeature).
Open a pull request.

Acknowledgments

NVIDIA for CUDA toolkit
CuPy team for GPU acceleration
OpenAI for the Whisper model
MoviePy contributors

Known Issues and Solutions

GPU-Related Issues

CUDA Out of Memory
- Reduce batch size in video generation (default is 50, try 25 or lower)
- Lower the frame rate
- Use a smaller window length for audio processing
- Close other GPU-intensive applications

CuPy Import Errors

# Fallback solution in code
try:
    import cupy as cp
except ImportError:
    import numpy as cp
    print("GPU acceleration not available, using CPU")

Whisper Model Loading Issues
- Ensure enough VRAM for model loading (at least 4GB recommended)
- If memory error occurs, try:
```
import torch
torch.cuda.empty_cache()  # Clear GPU memory
```
- Consider using smaller Whisper model variants (tiny, base, or small)
GPU Memory Leaks
- Clear matplotlib figures after each frame generation
- Use context managers for GPU operations
- Monitor GPU memory usage with nvidia-smi

Audio Processing Issues

Long Audio Files

Files longer than 10 minutes may cause memory issues

Solution: Split audio into chunks:

# Using pydub
from pydub import AudioSegment
audio = AudioSegment.from_file(file_path)
chunk_length = 10 * 60 * 1000  # 10 minutes in milliseconds
chunks = [audio[i:i+chunk_length] for i in range(0, len(audio), chunk_length)]

High Sample Rate Audio
- High sample rates (>48kHz) may cause processing delays
- Solution: Downsample before processing:
```
y, sr = librosa.load(audio_path, sr=44100)  # Force 44.1kHz
```
Spectrogram Memory Usage
- Large spectrograms can consume excessive memory
- Solutions:
  - Increase subsample_factor (default is 4)
  - Reduce frame_length (default is 2048)
  - Use lower frequency resolution

GUI Issues

Progress Bar Freezing
- Progress bar may appear frozen during heavy processing
- Solution: Reduce update frequency or use smaller batch sizes
- Alternative: Implement background worker for progress updates
Color Picker Dialog
- May crash on some Linux distributions
- Workaround: Use hex color codes directly:
```
self.viz_color = "#000000"  # Black
self.bg_color = "#FFFFFF"   # White
```

Window Scaling Issues

High DPI displays may show incorrect scaling

Solution: Add to start of script:

if hasattr(Qt, 'AA_EnableHighDpiScaling'):
    QApplication.setAttribute(Qt.AA_EnableHighDpiScaling, True)
if hasattr(Qt, 'AA_UseHighDpiPixmaps'):
    QApplication.setAttribute(Qt.AA_UseHighDpiPixmaps, True)

File Handling Issues

Temporary File Cleanup

Temporary files might not be deleted if process is interrupted

Solution: Implement cleanup on exit:

import atexit
import shutil

def cleanup_temp_files():
    if os.path.exists(tmp_dir):
        shutil.rmtree(tmp_dir)

atexit.register(cleanup_temp_files)

FFmpeg Codec Issues

Some systems may not support h264 codec

Solution: Provide fallback codec options:

try:
    # Try h264 first
    final_video.write_videofile(output_path, codec='libx264')
except:
    # Fallback to mpeg4
    final_video.write_videofile(output_path, codec='mpeg4')

Large Output Files
- High FPS and resolution can create very large files
- Solutions:
  - Add video compression options
  - Implement bitrate control:
```
final_video.write_videofile(
    output_path,
    bitrate="2000k",
    audio_bitrate="192k"
)
```

Performance Optimization Tips

Batch Processing

Adjust batch size based on available memory:

batch_size = min(50, total_frames // 100)  # Dynamic batch size

Memory Management

Implement periodic garbage collection:

import gc
gc.collect()
torch.cuda.empty_cache()  # If using GPU

Multi-threading

Be cautious with thread pool size:

max_workers = min(mp.cpu_count(), 8)  # Limit maximum threads

Frame Generation
- Cache frequently used matplotlib objects
- Use vectorized operations where possible
- Consider using OpenGL for real-time visualization

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
LICENSE		LICENSE
README.md		README.md
audio_visualizer.py		audio_visualizer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced Audio Visualizer

Features

Installation

Prerequisites

Additional Notes for Windows Users

1. CUDA Setup (for GPU acceleration)

2. Install Dependencies

3. Additional System Dependencies

Usage

Troubleshooting

Common Issues

Performance Tips

Dependencies

System Requirements

Contributing

Acknowledgments

Known Issues and Solutions

GPU-Related Issues

Audio Processing Issues

GUI Issues

File Handling Issues

Performance Optimization Tips

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Advanced Audio Visualizer

Features

Installation

Prerequisites

Additional Notes for Windows Users

1. CUDA Setup (for GPU acceleration)

2. Install Dependencies

3. Additional System Dependencies

Usage

Troubleshooting

Common Issues

Performance Tips

Dependencies

System Requirements

Contributing

Acknowledgments

Known Issues and Solutions

GPU-Related Issues

Audio Processing Issues

GUI Issues

File Handling Issues

Performance Optimization Tips

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages