🎵 Audio CNN Visualizer

A modern web application that uses deep learning to classify environmental sounds and visualize how Convolutional Neural Networks (CNNs) process audio data.

🚀 Live Demo

Frontend: http://localhost:3000 (when running locally) Backend: Modal serverless inference with trained CNN model

✨ Features

🧠 AI-Powered Audio Classification

50 Environmental Sound Classes: Dog barking, rain, car horns, fire crackling, and more
83.50% Accuracy: Trained on ESC-50 dataset
Real-time Processing: Upload and get instant predictions
Top 3 Predictions: Confidence scores for best matches

🎨 Modern UI/UX

Industry-Standard Design: Professional gradient themes and typography
Drag & Drop Upload: Intuitive file upload interface
Responsive Layout: Works on desktop, tablet, and mobile
Smooth Animations: Loading states and transitions

🔍 Neural Network Visualization

Feature Maps: See what the CNN "sees" at each layer
Spectrogram Analysis: Mel-frequency representation
Waveform Display: Time-domain audio visualization
Layer-by-Layer Breakdown: Understand model decision-making

🏗️ Architecture

Frontend (Next.js)

src/
├── app/
│   ├── page.tsx          # Main UI with modern design
│   └── api/proxy/        # CORS-free API proxy
├── components/
│   ├── FeatureMap.tsx    # CNN layer visualization
│   ├── Waveform.tsx      # Audio waveform display
│   └── ColorScale.tsx    # Visualization color scale
└── components/ui/        # Reusable UI components

Backend (Modal + PyTorch)

main.py                   # Modal serverless inference
model.py                  # CNN architecture definition
train.py                  # Model training script
AudioProcessor.py         # Audio preprocessing pipeline

🛠️ Tech Stack

Frontend

Next.js 15 - React framework with App Router
TypeScript - Type-safe development
Tailwind CSS - Utility-first styling
Chart.js - Data visualization

Backend

Modal - Serverless GPU inference platform
PyTorch - Deep learning framework
FastAPI - REST API endpoints
Librosa - Audio processing library

Model

CNN Architecture: 3 convolutional layers + 2 fully connected
Input: Mel spectrograms (128×431)
Output: 50-class environmental sound classification
Training: ESC-50 dataset with data augmentation

🚀 Quick Start

Prerequisites

Node.js 18+ and npm
Python 3.9+
Modal account and CLI

1. Clone the Repository

git clone https://github.com/tsj2003/Audio-Intelligence-.git
cd Audio-Intelligence-

2. Frontend Setup

# Install dependencies
npm install

# Start development server
npm run dev

3. Backend Setup

# Install Python dependencies
pip install -r requirements.txt

# Set up Modal
modal setup

# Start inference server
modal serve main.py

4. Access the Application

Frontend: http://localhost:3000
Upload: WAV files up to 5MB
Results: Real-time AI predictions and visualizations

📊 Model Performance

Metric	Value
Accuracy	83.50%
Dataset	ESC-50 (2,000 audio files)
Classes	50 environmental sounds
Training Time	~30 minutes on GPU
Model Size	~2MB

🎯 Supported Audio Classes

Animals

🐕 Dog, 🐱 Cat, 🐷 Pig, 🐄 Cow, 🐑 Sheep, 🐓 Rooster, 🐔 Hen, 🐸 Frog, 🐦 Bird, 🦗 Cricket

Nature

🌧️ Rain, 🌊 Sea waves, 💨 Wind, ⛈️ Thunderstorm, 🔥 Fire crackling, 🌊 Water drops

Transportation

🚗 Car horn, 🚂 Train, ✈️ Airplane, 🚁 Helicopter, 🚨 Siren

Household

🚪 Door knock, 🧺 Washing machine, 🧹 Vacuum cleaner, 🚰 Pouring water, 🪥 Brushing teeth

And many more...

🔧 API Endpoints

POST /api/proxy

Process audio file and return predictions with visualizations.

Request:

{
  "audio_data": "base64_encoded_wav_file"
}

Response:

{
  "predictions": [
    {"class": "dog", "confidence": 0.642},
    {"class": "cat", "confidence": 0.234},
    {"class": "pig", "confidence": 0.124}
  ],
  "visualization": {
    "conv1": {"shape": [32, 64, 215], "values": [...]},
    "conv2": {"shape": [64, 32, 107], "values": [...]},
    "conv3": {"shape": [128, 16, 53], "values": [...]}
  },
  "input_spectrogram": {"shape": [128, 431], "values": [...]},
  "waveform": {"values": [...], "sample_rate": 44100, "duration": 2.5}
}

🎨 UI/UX Features

Modern Design System

Color Palette: Professional blue/indigo gradients
Typography: Clean font hierarchy with proper contrast
Spacing: Consistent 8px grid system
Shadows: Subtle depth with Material Design principles

Interactive Elements

Upload Zone: Large drag & drop area with hover states
Loading States: Animated spinners with descriptive text
Progress Bars: Gradient-styled confidence indicators
Error Handling: Clear feedback for file limits and errors

Responsive Design

Mobile-First: Optimized for all screen sizes
Breakpoints: sm, lg, xl for optimal viewing
Flexible Layouts: Cards stack properly on smaller screens

🚀 Deployment

Frontend (Vercel)

# Deploy to Vercel
vercel --prod

Backend (Modal)

# Deploy inference endpoint
modal deploy main.py

📈 Future Enhancements

🤝 Contributing

This is a personal project showcasing AI audio classification and neural network visualization. The repository is maintained by tsj2003.

📄 License

This project is for educational and demonstration purposes.

🙏 Acknowledgments

ESC-50 Dataset - Environmental Sound Classification
Modal - Serverless GPU inference platform
Next.js - React framework
PyTorch - Deep learning framework

Built with ❤️ using modern web technologies and deep learning

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
audio-cnn-visualisation		audio-cnn-visualisation
src/app/api/proxy		src/app/api/proxy
.gitignore		.gitignore
README.md		README.md
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
theory.excalidraw		theory.excalidraw
thumbnail.png		thumbnail.png
train.py		train.py

tsj2003/Audio-Intelligence-

Folders and files

Latest commit

History

Repository files navigation