Skip to content

RonnyCHL/emsn-vocalization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

BirdNET-Pi Vocalization Classifier

DOI License Python 3.10+ PyTorch

Classify bird vocalizations as song, call, or alarm

Works with BirdNET-Pi to add vocalization context to your bird detections.

What does it do?

Detection Without With Vocalization
Eurasian Blackbird "Merel detected" "Merel - Zang (93%)"
European Robin "Roodborst detected" "Roodborst - Alarm (87%)"

Why is this useful?

  • Song: Bird is marking territory or attracting mate
  • Call: Contact calls, flock communication
  • Alarm: Predator nearby! (cat, sparrowhawk, etc.)

Quick Start

Download Pre-trained Models

197 Ultimate models trained on Google Colab A100 are available for download:

πŸ“₯ Download from Google Drive (~6.9 GB total)

Individual models are ~35 MB each. Download only the species you need, or get them all.

Use the Classifier

from src.classifiers.cnn_inference import VocalizationClassifier

classifier = VocalizationClassifier(models_dir="./models")
result = classifier.classify("Koolmees", "/path/to/audio.mp3")

if result:
    print(f"{result['type']} ({result['confidence']:.0%})")
    # Output: song (91%)

Train Your Own Models

# Clone the repository
git clone https://github.com/RonnyCHL/emsn-vocalization.git
cd emsn-vocalization

# Install dependencies
pip install torch librosa numpy scikit-learn tqdm requests

# Train a model (downloads data from Xeno-canto automatically)
python train_existing.py --species "Koolmees"

Train on Google Colab (Free GPU)

Open notebooks/EMSN_Vocalization_Colab_Training.ipynb in Google Colab for free GPU training.

How it Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  BirdNET-Pi     β”‚     β”‚  Vocalization    β”‚     β”‚   Result        β”‚
β”‚  "Merel"        β”‚ ──▢ β”‚  Classifier      β”‚ ──▢ β”‚  "Merel - Zang" β”‚
β”‚                 β”‚     β”‚  (CNN model)     β”‚     β”‚  (93%)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  1. BirdNET-Pi identifies the bird species from audio
  2. This classifier analyzes the same audio with a species-specific CNN
  3. Output includes vocalization type (song/call/alarm) with confidence

Project Structure

emsn-vocalization/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ classifiers/      # CNN model & inference
β”‚   β”œβ”€β”€ collectors/       # Xeno-canto data collection
β”‚   └── processors/       # Audio β†’ spectrogram processing
β”œβ”€β”€ notebooks/            # Colab training notebooks
β”œβ”€β”€ train_existing.py     # Main training script
β”œβ”€β”€ full_pipeline.py      # Complete pipeline (download β†’ train)
└── docker-compose.yml    # Docker training environment

Model Details

Ultimate Models (recommended)

  • Architecture: 4-layer CNN with batch normalization (32β†’64β†’128β†’256 filters)
  • Classifier: 512β†’256β†’num_classes with dropout
  • Training: Google Colab A100, 50 epochs, data augmentation
  • Size: ~35 MB per species model
  • Accuracy: Improved over standard models

Standard Models

  • Architecture: 3-layer CNN (32β†’64β†’128 filters)
  • Classifier: 256β†’num_classes
  • Size: ~2 MB per species model

Common specs

  • Input: Mel spectrograms (128x128, 3 seconds audio)
  • Output: song / call / alarm + confidence
  • Sample rate: 48 kHz, freq range: 500-8000 Hz

Available Models

Currently 197 trained models for Dutch bird species, including:

  • Koolmees (Great Tit)
  • Merel (Eurasian Blackbird)
  • Roodborst (European Robin)
  • Huismus (House Sparrow)
  • Vink (Common Chaffinch)
  • ... and 192 more

See the Google Drive folder for the complete list.

Integration Options

Standalone (recommended for testing)

Run as separate service, reads BirdNET-Pi's birds.db.

With BirdNET-Pi

Can be integrated to show vocalization in the web interface.

See COMMUNITY_PITCH.md for integration discussion.

Requirements

  • Python 3.10+
  • PyTorch 2.0+
  • librosa
  • numpy
  • Raspberry Pi 4/5 (for inference) or any Linux system

Training Data

Audio data is automatically downloaded from Xeno-canto:

  • Quality A/B recordings preferred
  • Balanced sampling across vocalization types
  • Respects Xeno-canto API rate limits

Contributing

  • Test the classifier: Try it with your BirdNET-Pi setup
  • Train more species: Use Colab notebook to train new models
  • Report issues: Open a GitHub issue
  • Integration ideas: See community pitch document

Related Projects

License

MIT License - free to use, modify, and distribute.

Author

Ronny Hullegie - EMSN Project (Ecologisch Monitoring Systeem Nijverdal)

Citation

@software{hullegie2025vocalization,
  author = {Hullegie, Ronny},
  title = {BirdNET-Pi Vocalization Classifier},
  year = {2025},
  url = {https://github.com/RonnyCHL/emsn-vocalization}
}

About

CNN-based bird vocalization classifier (song/call/alarm) for 232 Dutch species

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •