A lightweight sound event detection system for discovering whale calls in marine audio recordings. This repository contains the implementation of our hybrid CNN-BiLSTM architecture with residual bottleneck and depthwise convolutions, designed for coherent per-frame whale call event detection.
Whale-VAD uses PyTorch Hub for easy model loading and inference. The model automatically handles feature extraction and produces frame-level probability outputs for three whale call types: bmabz, d, and bp.
PyTorch Hub automatically handles downloading and installing the required code. No additional installations beyond PyTorch is needed.
import torch
import torchaudio as ta
# Load the model, and unpack classifier and feature extractor (transform)
# NOTE: will automatically fetch model weights
classifier, transform = torch.hub.load("CMGeldenhuys/Whale-VAD", 'whalevad', weights='DEFAULT')
# Load audio file (must be sampled at 250 Hz, single channel)
audio, sr = ta.load("whale-call.wav")
# shape: (channels=1, samples)
assert sr == 250
# Perform inference
features, _ = transform(audio)
logits, prob, _ = classifier(features) # Frame-level probabilities for bmabz, d, and bpIf you have installed the package locally (see Installation), you can use the whalevad module directly:
import torch
from whalevad import whalevad
from whalevad.utils import get_atbfl_examplar
# Create model with random initialization
# NOTE: model contains both classifier and feature extractor (transform)
model = whalevad(weights=None)
# (optional) Unpack classifier and feature extractor
# classifier, transform = model
# (optional) Manually load pretrained model weights from checkpoint
path_to_checkpoint = "path/to/checkpoint.pth"
checkpoint = torch.load(path_to_checkpoint, weights_only=True, map_location='cpu')
model.load_state_dict(checkpoint)
# (optional) Fetch and load examplar from ATBFL dataset
# NOTE: might take a moment to download
# requires `remotezip`, install with `pip install remotezip`
audio, sr = get_atbfl_examplar()
# shape: (channel=1, samples)
# Perform inference
logits, prob, _ = model(audio)
# shape: (batch=1, frame, classes)- Python >= 3.11.0
- PyTorch >= 2.0.0
- torchaudio
- Sample Rate: 250 Hz (required)
- Channels: Single channel (mono) audio
- Format: Any format supported by torchaudio
The model produces frame-level probability outputs for three whale call types:
bmabz: Blue whale calls (BmA, BmB, BmZ)d: D-calls (BmD and BpD)bp: Fin whale calls (Bp20 and Bp20plus)
# Inside your project virtual environment
pip install whalevadgit clone https://github.com/CMGeldenhuys/Whale-VAD.git
cd Whale-VAD
pip install .This will install the whalevad package and all required dependencies.
This model was trained on the Acoustic Trends Blue Fin Library (ATBFL) dataset as part of the BioDCASE 2025 Challenge (Task 2).
- Challenge Website: https://biodcase.github.io/challenge2025/task2
- Dataset DOI: https://doi.org/10.5281/zenodo.15092732
Pre-trained model weights are available in the GitHub Releases section. Weights can be loaded automatically via PyTorch Hub or downloaded manually.
If you use this work in your research, please cite:
Geldenhuys, C. M., Tonitz, G., & Niesler, T. R. (2025). Whale-VAD: Whale Vocalisation Activity Detection. Proceedings of the 10th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2025), 165–169. https://doi.org/10.5281/zenodo.17251589
Geldenhuys, C. M., Tonitz, G., & Niesler, T. R. (2025). WhaleVAD-BPN: Improving Baleen Whale Call Detection with Boundary Proposal Networks and Post-processing Optimisation (No. arXiv:2510.21280). arXiv. https://doi.org/10.48550/arXiv.2510.21280
@inproceedings{Geldenhuys2025WhaleVAD,
author = "Geldenhuys, Christiaan and Tonitz, Günther and Niesler, Thomas",
title = "Whale-VAD: Whale Vocalisation Activity Detection",
booktitle = "Proceedings of the 10th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2025)",
address = "Barcelona, Spain",
month = "October",
year = "2025",
pages = "165--169",
isbn = "978-84-09-77652-8",
doi = "10.5281/zenodo.17251589"
}
@misc{Geldenhuys2025WhaleVADBPN,
title={WhaleVAD-BPN: Improving Baleen Whale Call Detection with Boundary Proposal Networks and Post-processing Optimisation},
author={Christiaan M. Geldenhuys and Günther Tonitz and Thomas R. Niesler},
year={2025},
eprint={2510.21280},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2510.21280},
}We welcome contributions to improve Whale-VAD! Please feel free to submit issues, fork the repository, and create pull requests.
The authors gratefully acknowledge Telkom (South Africa) for their financial support, and the Stellenbosch Rhasatsha high performance computing (HPC) facility for the compute time provided to the research presented in this work.
This project is licensed under the GNU General Public License v3.0 (GPL-3.0) - a copyleft license that requires anyone who distributes the code or a derivative work to make the source available under the same terms. All code and model weights are provided as is.
Presented at the 10th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2025), Barcelona, Spain, October 2025.
