A modern web application that uses deep learning to classify environmental sounds and visualize how Convolutional Neural Networks (CNNs) process audio data.
Frontend: http://localhost:3000 (when running locally) Backend: Modal serverless inference with trained CNN model
- 50 Environmental Sound Classes: Dog barking, rain, car horns, fire crackling, and more
- 83.50% Accuracy: Trained on ESC-50 dataset
- Real-time Processing: Upload and get instant predictions
- Top 3 Predictions: Confidence scores for best matches
- Industry-Standard Design: Professional gradient themes and typography
- Drag & Drop Upload: Intuitive file upload interface
- Responsive Layout: Works on desktop, tablet, and mobile
- Smooth Animations: Loading states and transitions
- Feature Maps: See what the CNN "sees" at each layer
- Spectrogram Analysis: Mel-frequency representation
- Waveform Display: Time-domain audio visualization
- Layer-by-Layer Breakdown: Understand model decision-making
src/
βββ app/
β βββ page.tsx # Main UI with modern design
β βββ api/proxy/ # CORS-free API proxy
βββ components/
β βββ FeatureMap.tsx # CNN layer visualization
β βββ Waveform.tsx # Audio waveform display
β βββ ColorScale.tsx # Visualization color scale
βββ components/ui/ # Reusable UI components
main.py # Modal serverless inference
model.py # CNN architecture definition
train.py # Model training script
AudioProcessor.py # Audio preprocessing pipeline
- Next.js 15 - React framework with App Router
- TypeScript - Type-safe development
- Tailwind CSS - Utility-first styling
- Chart.js - Data visualization
- Modal - Serverless GPU inference platform
- PyTorch - Deep learning framework
- FastAPI - REST API endpoints
- Librosa - Audio processing library
- CNN Architecture: 3 convolutional layers + 2 fully connected
- Input: Mel spectrograms (128Γ431)
- Output: 50-class environmental sound classification
- Training: ESC-50 dataset with data augmentation
- Node.js 18+ and npm
- Python 3.9+
- Modal account and CLI
git clone https://github.com/tsj2003/Audio-Intelligence-.git
cd Audio-Intelligence-# Install dependencies
npm install
# Start development server
npm run dev# Install Python dependencies
pip install -r requirements.txt
# Set up Modal
modal setup
# Start inference server
modal serve main.py- Frontend: http://localhost:3000
- Upload: WAV files up to 5MB
- Results: Real-time AI predictions and visualizations
| Metric | Value |
|---|---|
| Accuracy | 83.50% |
| Dataset | ESC-50 (2,000 audio files) |
| Classes | 50 environmental sounds |
| Training Time | ~30 minutes on GPU |
| Model Size | ~2MB |
π Dog, π± Cat, π· Pig, π Cow, π Sheep, π Rooster, π Hen, πΈ Frog, π¦ Bird, π¦ Cricket
π§οΈ Rain, π Sea waves, π¨ Wind, βοΈ Thunderstorm, π₯ Fire crackling, π Water drops
π Car horn, π Train,
πͺ Door knock, π§Ί Washing machine, π§Ή Vacuum cleaner, π° Pouring water, πͺ₯ Brushing teeth
Process audio file and return predictions with visualizations.
Request:
{
"audio_data": "base64_encoded_wav_file"
}Response:
{
"predictions": [
{"class": "dog", "confidence": 0.642},
{"class": "cat", "confidence": 0.234},
{"class": "pig", "confidence": 0.124}
],
"visualization": {
"conv1": {"shape": [32, 64, 215], "values": [...]},
"conv2": {"shape": [64, 32, 107], "values": [...]},
"conv3": {"shape": [128, 16, 53], "values": [...]}
},
"input_spectrogram": {"shape": [128, 431], "values": [...]},
"waveform": {"values": [...], "sample_rate": 44100, "duration": 2.5}
}- Color Palette: Professional blue/indigo gradients
- Typography: Clean font hierarchy with proper contrast
- Spacing: Consistent 8px grid system
- Shadows: Subtle depth with Material Design principles
- Upload Zone: Large drag & drop area with hover states
- Loading States: Animated spinners with descriptive text
- Progress Bars: Gradient-styled confidence indicators
- Error Handling: Clear feedback for file limits and errors
- Mobile-First: Optimized for all screen sizes
- Breakpoints: sm, lg, xl for optimal viewing
- Flexible Layouts: Cards stack properly on smaller screens
# Deploy to Vercel
vercel --prod# Deploy inference endpoint
modal deploy main.py- Real-time audio recording
- Batch file processing
- Audio preprocessing options
- Model performance metrics
- Audio segmentation
- Export results functionality
This is a personal project showcasing AI audio classification and neural network visualization. The repository is maintained by tsj2003.
This project is for educational and demonstration purposes.
- ESC-50 Dataset - Environmental Sound Classification
- Modal - Serverless GPU inference platform
- Next.js - React framework
- PyTorch - Deep learning framework
Built with β€οΈ using modern web technologies and deep learning