A comprehensive age estimation system that uses both image and audio analysis to predict age groups. The system combines computer vision and audio processing techniques to provide accurate age predictions through multiple input modalities.
- Image-based Age Prediction: Upload images or use real-time camera feed for age estimation
- Audio-based Age Prediction: Upload audio files or record voice in real-time for age estimation
- Multi-modal Analysis: Combines both visual and audio cues for enhanced accuracy
- Real-time Processing: Live camera and microphone integration
- Web Interface: User-friendly dashboard with multiple input options
- Batch Processing: Support for multiple image/audio file uploads
- HTML/CSS/JavaScript web interface
- Real-time camera and microphone access
- File upload capabilities
- Interactive dashboard
- Flask-based REST API
- PyTorch models for image processing
- Scikit-learn models for audio processing
- CORS support for cross-origin requests
- Image Model: ResNet-18 based CNN trained on FairFace dataset
- Audio Model: MFCC feature extraction with machine learning classification
The system classifies ages into the following groups:
- 0-2 years
- 3-9 years
- 10-19 years
- 20-29 years
- 30-39 years
- 40-49 years
- 50-59 years
- 60-69 years
- 70+ years
- Python 3.8 or higher
- Web browser with camera/microphone access
- CUDA-compatible GPU (optional, for faster processing)
-
Clone the repository
git clone <repository-url> cd Hackathon-main
-
Install Python dependencies
pip install -r requirements.txt
-
Download pre-trained models
- Place the image model (
resnet_age_best.pth) in thebackend/directory - Place the audio models (
final_model.sav,scaler.sav,feature_selector.sav,label_encoder.sav) in thebackend/directory
- Place the image model (
-
Start the Backend Server
cd backend python app.pyThe backend will start on
http://localhost:5000 -
Open the Frontend
- Navigate to the
frontend/directory - Open
dashboard.htmlin a web browser - Or use the provided batch file:
start_backend.bat
- Navigate to the
-
Image Analysis
- Click "Upload Image" to select image files
- Or click "Real-Time Face Detection" for live camera feed
- The system will detect faces and predict age groups
-
Audio Analysis
- Click "Upload Voice" to select audio files
- Or click "Real-Time Voice Detection" to record live audio
- The system will analyze voice characteristics to predict age
- POST
/predict - Content-Type:
multipart/form-data - Parameters:
file(image file) - Response: JSON with age predictions and processed image
- POST
/predict_audio - Content-Type:
multipart/form-data - Parameters:
file(audio file) - Response: JSON with age group prediction
- GET
/ - Response: Backend status and model availability
The image model is based on ResNet-18 architecture and can be trained using the FairFace dataset:
cd Training/image\ prediction/
python fairface_age_resnet.py \
--data-root /path/to/FairFace \
--train-csv annotations/fairface_label_train.csv \
--val-csv annotations/fairface_label_val.csv \
--epochs 12 --batch-size 64 --lr 1e-3 \
--model-out output/resnet_age_best.pthThe audio model uses MFCC features and can be trained using the provided Jupyter notebook:
cd Training/audio/
jupyter notebook Audio_model_training.ipynbHackathon-main/
├── backend/
│ ├── app.py # Flask backend server
│ ├── *.sav # Pre-trained audio models
│ └── resnet_age_best.pth # Pre-trained image model
├── frontend/
│ ├── dashboard.html # Main dashboard
│ ├── dashboard.js # Frontend JavaScript
│ ├── style.css # Styling
│ ├── assets/ # UI assets
│ └── start_backend.bat # Windows startup script
├── Training/
│ ├── audio/
│ │ └── Audio_model_training.ipynb
│ └── image prediction/
│ ├── fairface_age_resnet.py
│ ├── live_age_prediction.py
│ ├── photo_age_prediction.py
│ └── run_*.py
└── requirements.txt
- Face detection using OpenCV Haar Cascades
- Face preprocessing and normalization
- ResNet-18 feature extraction
- Age classification into 9 categories
- Confidence scoring and visualization
- Audio loading and preprocessing (5-second clips)
- MFCC feature extraction (20 coefficients)
- Spectral feature computation (centroid, bandwidth, rolloff)
- Feature scaling and selection
- Machine learning classification
- GPU acceleration for image processing (when available)
- Efficient audio processing with librosa
- Optimized model loading and caching
- Real-time processing capabilities
-
Backend Connection Failed
- Ensure Python dependencies are installed
- Check if port 5000 is available
- Verify model files are in the correct location
-
Camera/Microphone Access Denied
- Enable camera/microphone permissions in browser
- Use HTTPS for production deployment
- Check browser compatibility
-
Model Loading Errors
- Verify model files are not corrupted
- Check Python version compatibility
- Ensure sufficient memory for model loading
- Chrome 60+
- Firefox 55+
- Safari 11+
- Edge 79+
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details
- FairFace dataset for image model training
- Mozilla Common Voice dataset for audio model training
- OpenCV for computer vision capabilities
- PyTorch and Scikit-learn for machine learning frameworks
