Whale-VAD: Whale Vocalisation Activity Detection

A lightweight sound event detection system for discovering whale calls in marine audio recordings. This repository contains the implementation of our hybrid CNN-BiLSTM architecture with residual bottleneck and depthwise convolutions, designed for coherent per-frame whale call event detection.

Getting Started

Whale-VAD uses PyTorch Hub for easy model loading and inference. The model automatically handles feature extraction and produces frame-level probability outputs for three whale call types: bmabz, d, and bp.

Basic Usage (PyTorch Hub, Recommended)

PyTorch Hub automatically handles downloading and installing the required code. No additional installations beyond PyTorch is needed.

import torch
import torchaudio as ta

# Load the model, and unpack classifier and feature extractor (transform)
# NOTE: will automatically fetch model weights
classifier, transform = torch.hub.load("CMGeldenhuys/Whale-VAD", 'whalevad', weights='DEFAULT')

# Load audio file (must be sampled at 250 Hz, single channel)
audio, sr = ta.load("whale-call.wav")
# shape: (channels=1, samples)
assert sr == 250

# Perform inference
features, _ = transform(audio)
logits, prob, _ = classifier(features)  # Frame-level probabilities for bmabz, d, and bp

Manual Usage

If you have installed the package locally (see Installation), you can use the whalevad module directly:

import torch
from whalevad import whalevad
from whalevad.utils import get_atbfl_examplar

# Create model with random initialization
# NOTE: model contains both classifier and feature extractor (transform)
model = whalevad(weights=None)

# (optional) Unpack classifier and feature extractor
# classifier, transform = model

# (optional) Manually load pretrained model weights from checkpoint
path_to_checkpoint = "path/to/checkpoint.pth"
checkpoint = torch.load(path_to_checkpoint, weights_only=True, map_location='cpu')
model.load_state_dict(checkpoint)

# (optional) Fetch and load examplar from ATBFL dataset
# NOTE: might take a moment to download
# requires `remotezip`, install with `pip install remotezip`
audio, sr = get_atbfl_examplar()
# shape: (channel=1, samples)

# Perform inference
logits, prob, _ = model(audio)
# shape: (batch=1, frame, classes)

Requirements

Python >= 3.11.0
PyTorch >= 2.0.0
torchaudio

Input Specifications

Sample Rate: 250 Hz (required)
Channels: Single channel (mono) audio
Format: Any format supported by torchaudio

Output

The model produces frame-level probability outputs for three whale call types:

bmabz: Blue whale calls (BmA, BmB, BmZ)
d: D-calls (BmD and BpD)
bp: Fin whale calls (Bp20 and Bp20plus)

Installation

Option 1: Install via pip (Recommended)

# Inside your project virtual environment
pip install whalevad

Option 2: Install from source

git clone https://github.com/CMGeldenhuys/Whale-VAD.git
cd Whale-VAD
pip install .

This will install the whalevad package and all required dependencies.

Dataset

This model was trained on the Acoustic Trends Blue Fin Library (ATBFL) dataset as part of the BioDCASE 2025 Challenge (Task 2).

Challenge Website: https://biodcase.github.io/challenge2025/task2
Dataset DOI: https://doi.org/10.5281/zenodo.15092732

Model Weights

Pre-trained model weights are available in the GitHub Releases section. Weights can be loaded automatically via PyTorch Hub or downloaded manually.

Citation

If you use this work in your research, please cite:

Geldenhuys, C. M., Tonitz, G., & Niesler, T. R. (2025). Whale-VAD: Whale Vocalisation Activity Detection. Proceedings of the 10th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2025), 165–169. https://doi.org/10.5281/zenodo.17251589

Geldenhuys, C. M., Tonitz, G., & Niesler, T. R. (2025). WhaleVAD-BPN: Improving Baleen Whale Call Detection with Boundary Proposal Networks and Post-processing Optimisation (No. arXiv:2510.21280). arXiv. https://doi.org/10.48550/arXiv.2510.21280

@inproceedings{Geldenhuys2025WhaleVAD,
    author = "Geldenhuys, Christiaan and Tonitz, Günther and Niesler, Thomas",
    title = "Whale-VAD: Whale Vocalisation Activity Detection",
    booktitle = "Proceedings of the 10th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2025)",
    address = "Barcelona, Spain",
    month = "October",
    year = "2025",
    pages = "165--169",
    isbn = "978-84-09-77652-8",
    doi = "10.5281/zenodo.17251589"
}

@misc{Geldenhuys2025WhaleVADBPN,
      title={WhaleVAD-BPN: Improving Baleen Whale Call Detection with Boundary Proposal Networks and Post-processing Optimisation},
      author={Christiaan M. Geldenhuys and Günther Tonitz and Thomas R. Niesler},
      year={2025},
      eprint={2510.21280},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2510.21280},
}

Contributing

We welcome contributions to improve Whale-VAD! Please feel free to submit issues, fork the repository, and create pull requests.

Authors

Christiaan M. Geldenhuys
Günther Tonitz
Thomas R. Niesler

Acknowledgements

The authors gratefully acknowledge Telkom (South Africa) for their financial support, and the Stellenbosch Rhasatsha high performance computing (HPC) facility for the compute time provided to the research presented in this work.

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0) - a copyleft license that requires anyone who distributes the code or a derivative work to make the source available under the same terms. All code and model weights are provided as is.

Presented at the 10th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2025), Barcelona, Spain, October 2025.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.github/workflows		.github/workflows
assets		assets
tests		tests
whalevad		whalevad
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Whale-VAD: Whale Vocalisation Activity Detection

Getting Started

Basic Usage (PyTorch Hub, Recommended)

Manual Usage

Requirements

Input Specifications

Output

Installation

Option 1: Install via pip (Recommended)

Option 2: Install from source

Dataset

Model Weights

Citation

Contributing

Authors

Acknowledgements

License

About

Uh oh!

Releases 1

Languages

License

CMGeldenhuys/Whale-VAD

Folders and files

Latest commit

History

Repository files navigation

Whale-VAD: Whale Vocalisation Activity Detection

Getting Started

Basic Usage (PyTorch Hub, Recommended)

Manual Usage

Requirements

Input Specifications

Output

Installation

Option 1: Install via pip (Recommended)

Option 2: Install from source

Dataset

Model Weights

Citation

Contributing

Authors

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages