Skip to content

tsj2003/Audio-Intelligence-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎡 Audio CNN Visualizer

A modern web application that uses deep learning to classify environmental sounds and visualize how Convolutional Neural Networks (CNNs) process audio data.

Audio CNN Visualizer Next.js PyTorch Modal

πŸš€ Live Demo

Frontend: http://localhost:3000 (when running locally) Backend: Modal serverless inference with trained CNN model

✨ Features

🧠 AI-Powered Audio Classification

  • 50 Environmental Sound Classes: Dog barking, rain, car horns, fire crackling, and more
  • 83.50% Accuracy: Trained on ESC-50 dataset
  • Real-time Processing: Upload and get instant predictions
  • Top 3 Predictions: Confidence scores for best matches

🎨 Modern UI/UX

  • Industry-Standard Design: Professional gradient themes and typography
  • Drag & Drop Upload: Intuitive file upload interface
  • Responsive Layout: Works on desktop, tablet, and mobile
  • Smooth Animations: Loading states and transitions

πŸ” Neural Network Visualization

  • Feature Maps: See what the CNN "sees" at each layer
  • Spectrogram Analysis: Mel-frequency representation
  • Waveform Display: Time-domain audio visualization
  • Layer-by-Layer Breakdown: Understand model decision-making

πŸ—οΈ Architecture

Frontend (Next.js)

src/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ page.tsx          # Main UI with modern design
β”‚   └── api/proxy/        # CORS-free API proxy
β”œβ”€β”€ components/
β”‚   β”œβ”€β”€ FeatureMap.tsx    # CNN layer visualization
β”‚   β”œβ”€β”€ Waveform.tsx      # Audio waveform display
β”‚   └── ColorScale.tsx    # Visualization color scale
└── components/ui/        # Reusable UI components

Backend (Modal + PyTorch)

main.py                   # Modal serverless inference
model.py                  # CNN architecture definition
train.py                  # Model training script
AudioProcessor.py         # Audio preprocessing pipeline

πŸ› οΈ Tech Stack

Frontend

  • Next.js 15 - React framework with App Router
  • TypeScript - Type-safe development
  • Tailwind CSS - Utility-first styling
  • Chart.js - Data visualization

Backend

  • Modal - Serverless GPU inference platform
  • PyTorch - Deep learning framework
  • FastAPI - REST API endpoints
  • Librosa - Audio processing library

Model

  • CNN Architecture: 3 convolutional layers + 2 fully connected
  • Input: Mel spectrograms (128Γ—431)
  • Output: 50-class environmental sound classification
  • Training: ESC-50 dataset with data augmentation

πŸš€ Quick Start

Prerequisites

  • Node.js 18+ and npm
  • Python 3.9+
  • Modal account and CLI

1. Clone the Repository

git clone https://github.com/tsj2003/Audio-Intelligence-.git
cd Audio-Intelligence-

2. Frontend Setup

# Install dependencies
npm install

# Start development server
npm run dev

3. Backend Setup

# Install Python dependencies
pip install -r requirements.txt

# Set up Modal
modal setup

# Start inference server
modal serve main.py

4. Access the Application

  • Frontend: http://localhost:3000
  • Upload: WAV files up to 5MB
  • Results: Real-time AI predictions and visualizations

πŸ“Š Model Performance

Metric Value
Accuracy 83.50%
Dataset ESC-50 (2,000 audio files)
Classes 50 environmental sounds
Training Time ~30 minutes on GPU
Model Size ~2MB

🎯 Supported Audio Classes

Animals

πŸ• Dog, 🐱 Cat, 🐷 Pig, πŸ„ Cow, πŸ‘ Sheep, πŸ“ Rooster, πŸ” Hen, 🐸 Frog, 🐦 Bird, πŸ¦— Cricket

Nature

🌧️ Rain, 🌊 Sea waves, πŸ’¨ Wind, β›ˆοΈ Thunderstorm, πŸ”₯ Fire crackling, 🌊 Water drops

Transportation

πŸš— Car horn, πŸš‚ Train, ✈️ Airplane, 🚁 Helicopter, 🚨 Siren

Household

πŸšͺ Door knock, 🧺 Washing machine, 🧹 Vacuum cleaner, 🚰 Pouring water, πŸͺ₯ Brushing teeth

And many more...

πŸ”§ API Endpoints

POST /api/proxy

Process audio file and return predictions with visualizations.

Request:

{
  "audio_data": "base64_encoded_wav_file"
}

Response:

{
  "predictions": [
    {"class": "dog", "confidence": 0.642},
    {"class": "cat", "confidence": 0.234},
    {"class": "pig", "confidence": 0.124}
  ],
  "visualization": {
    "conv1": {"shape": [32, 64, 215], "values": [...]},
    "conv2": {"shape": [64, 32, 107], "values": [...]},
    "conv3": {"shape": [128, 16, 53], "values": [...]}
  },
  "input_spectrogram": {"shape": [128, 431], "values": [...]},
  "waveform": {"values": [...], "sample_rate": 44100, "duration": 2.5}
}

🎨 UI/UX Features

Modern Design System

  • Color Palette: Professional blue/indigo gradients
  • Typography: Clean font hierarchy with proper contrast
  • Spacing: Consistent 8px grid system
  • Shadows: Subtle depth with Material Design principles

Interactive Elements

  • Upload Zone: Large drag & drop area with hover states
  • Loading States: Animated spinners with descriptive text
  • Progress Bars: Gradient-styled confidence indicators
  • Error Handling: Clear feedback for file limits and errors

Responsive Design

  • Mobile-First: Optimized for all screen sizes
  • Breakpoints: sm, lg, xl for optimal viewing
  • Flexible Layouts: Cards stack properly on smaller screens

πŸš€ Deployment

Frontend (Vercel)

# Deploy to Vercel
vercel --prod

Backend (Modal)

# Deploy inference endpoint
modal deploy main.py

πŸ“ˆ Future Enhancements

  • Real-time audio recording
  • Batch file processing
  • Audio preprocessing options
  • Model performance metrics
  • Audio segmentation
  • Export results functionality

🀝 Contributing

This is a personal project showcasing AI audio classification and neural network visualization. The repository is maintained by tsj2003.

πŸ“„ License

This project is for educational and demonstration purposes.

πŸ™ Acknowledgments

  • ESC-50 Dataset - Environmental Sound Classification
  • Modal - Serverless GPU inference platform
  • Next.js - React framework
  • PyTorch - Deep learning framework

Built with ❀️ using modern web technologies and deep learning

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published