Classify bird vocalizations as song, call, or alarm
Works with BirdNET-Pi to add vocalization context to your bird detections.
| Detection | Without | With Vocalization |
|---|---|---|
| Eurasian Blackbird | "Merel detected" | "Merel - Zang (93%)" |
| European Robin | "Roodborst detected" | "Roodborst - Alarm (87%)" |
- Song: Bird is marking territory or attracting mate
- Call: Contact calls, flock communication
- Alarm: Predator nearby! (cat, sparrowhawk, etc.)
197 Ultimate models trained on Google Colab A100 are available for download:
π₯ Download from Google Drive (~6.9 GB total)
Individual models are ~35 MB each. Download only the species you need, or get them all.
from src.classifiers.cnn_inference import VocalizationClassifier
classifier = VocalizationClassifier(models_dir="./models")
result = classifier.classify("Koolmees", "/path/to/audio.mp3")
if result:
print(f"{result['type']} ({result['confidence']:.0%})")
# Output: song (91%)# Clone the repository
git clone https://github.com/RonnyCHL/emsn-vocalization.git
cd emsn-vocalization
# Install dependencies
pip install torch librosa numpy scikit-learn tqdm requests
# Train a model (downloads data from Xeno-canto automatically)
python train_existing.py --species "Koolmees"Open notebooks/EMSN_Vocalization_Colab_Training.ipynb in Google Colab for free GPU training.
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β BirdNET-Pi β β Vocalization β β Result β
β "Merel" β βββΆ β Classifier β βββΆ β "Merel - Zang" β
β β β (CNN model) β β (93%) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
- BirdNET-Pi identifies the bird species from audio
- This classifier analyzes the same audio with a species-specific CNN
- Output includes vocalization type (song/call/alarm) with confidence
emsn-vocalization/
βββ src/
β βββ classifiers/ # CNN model & inference
β βββ collectors/ # Xeno-canto data collection
β βββ processors/ # Audio β spectrogram processing
βββ notebooks/ # Colab training notebooks
βββ train_existing.py # Main training script
βββ full_pipeline.py # Complete pipeline (download β train)
βββ docker-compose.yml # Docker training environment
- Architecture: 4-layer CNN with batch normalization (32β64β128β256 filters)
- Classifier: 512β256βnum_classes with dropout
- Training: Google Colab A100, 50 epochs, data augmentation
- Size: ~35 MB per species model
- Accuracy: Improved over standard models
- Architecture: 3-layer CNN (32β64β128 filters)
- Classifier: 256βnum_classes
- Size: ~2 MB per species model
- Input: Mel spectrograms (128x128, 3 seconds audio)
- Output: song / call / alarm + confidence
- Sample rate: 48 kHz, freq range: 500-8000 Hz
Currently 197 trained models for Dutch bird species, including:
- Koolmees (Great Tit)
- Merel (Eurasian Blackbird)
- Roodborst (European Robin)
- Huismus (House Sparrow)
- Vink (Common Chaffinch)
- ... and 192 more
See the Google Drive folder for the complete list.
Run as separate service, reads BirdNET-Pi's birds.db.
Can be integrated to show vocalization in the web interface.
See COMMUNITY_PITCH.md for integration discussion.
- Python 3.10+
- PyTorch 2.0+
- librosa
- numpy
- Raspberry Pi 4/5 (for inference) or any Linux system
Audio data is automatically downloaded from Xeno-canto:
- Quality A/B recordings preferred
- Balanced sampling across vocalization types
- Respects Xeno-canto API rate limits
- Test the classifier: Try it with your BirdNET-Pi setup
- Train more species: Use Colab notebook to train new models
- Report issues: Open a GitHub issue
- Integration ideas: See community pitch document
- BirdNET-Pi - Bird species identification
- Xeno-canto - Bird sound database
MIT License - free to use, modify, and distribute.
Ronny Hullegie - EMSN Project (Ecologisch Monitoring Systeem Nijverdal)
@software{hullegie2025vocalization,
author = {Hullegie, Ronny},
title = {BirdNET-Pi Vocalization Classifier},
year = {2025},
url = {https://github.com/RonnyCHL/emsn-vocalization}
}