Skip to content

CMGeldenhuys/Whale-VAD

Repository files navigation

Whale-VAD: Whale Vocalisation Activity Detection

DOI WhaleVAD Paper WhaleVAD-BPN Paper License: GPL v3 Python 3.11+ PyTorch 2.7+

A lightweight sound event detection system for discovering whale calls in marine audio recordings. This repository contains the implementation of our hybrid CNN-BiLSTM architecture with residual bottleneck and depthwise convolutions, designed for coherent per-frame whale call event detection.

BioDCASE Logo

Getting Started

Whale-VAD uses PyTorch Hub for easy model loading and inference. The model automatically handles feature extraction and produces frame-level probability outputs for three whale call types: bmabz, d, and bp.

Basic Usage (PyTorch Hub, Recommended)

PyTorch Hub automatically handles downloading and installing the required code. No additional installations beyond PyTorch is needed.

import torch
import torchaudio as ta

# Load the model, and unpack classifier and feature extractor (transform)
# NOTE: will automatically fetch model weights
classifier, transform = torch.hub.load("CMGeldenhuys/Whale-VAD", 'whalevad', weights='DEFAULT')

# Load audio file (must be sampled at 250 Hz, single channel)
audio, sr = ta.load("whale-call.wav")
# shape: (channels=1, samples)
assert sr == 250

# Perform inference
features, _ = transform(audio)
logits, prob, _ = classifier(features)  # Frame-level probabilities for bmabz, d, and bp

Manual Usage

If you have installed the package locally (see Installation), you can use the whalevad module directly:

import torch
from whalevad import whalevad
from whalevad.utils import get_atbfl_examplar

# Create model with random initialization
# NOTE: model contains both classifier and feature extractor (transform)
model = whalevad(weights=None)

# (optional) Unpack classifier and feature extractor
# classifier, transform = model

# (optional) Manually load pretrained model weights from checkpoint
path_to_checkpoint = "path/to/checkpoint.pth"
checkpoint = torch.load(path_to_checkpoint, weights_only=True, map_location='cpu')
model.load_state_dict(checkpoint)

# (optional) Fetch and load examplar from ATBFL dataset
# NOTE: might take a moment to download
# requires `remotezip`, install with `pip install remotezip`
audio, sr = get_atbfl_examplar()
# shape: (channel=1, samples)

# Perform inference
logits, prob, _ = model(audio)
# shape: (batch=1, frame, classes)

Requirements

  • Python >= 3.11.0
  • PyTorch >= 2.0.0
  • torchaudio

Input Specifications

  • Sample Rate: 250 Hz (required)
  • Channels: Single channel (mono) audio
  • Format: Any format supported by torchaudio

Output

The model produces frame-level probability outputs for three whale call types:

  • bmabz: Blue whale calls (BmA, BmB, BmZ)
  • d: D-calls (BmD and BpD)
  • bp: Fin whale calls (Bp20 and Bp20plus)

Installation

Option 1: Install via pip (Recommended)

# Inside your project virtual environment
pip install whalevad

Option 2: Install from source

git clone https://github.com/CMGeldenhuys/Whale-VAD.git
cd Whale-VAD
pip install .

This will install the whalevad package and all required dependencies.

Dataset

This model was trained on the Acoustic Trends Blue Fin Library (ATBFL) dataset as part of the BioDCASE 2025 Challenge (Task 2).

Model Weights

Pre-trained model weights are available in the GitHub Releases section. Weights can be loaded automatically via PyTorch Hub or downloaded manually.

Citation

If you use this work in your research, please cite:

Geldenhuys, C. M., Tonitz, G., & Niesler, T. R. (2025). Whale-VAD: Whale Vocalisation Activity Detection. Proceedings of the 10th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2025), 165–169. https://doi.org/10.5281/zenodo.17251589

Geldenhuys, C. M., Tonitz, G., & Niesler, T. R. (2025). WhaleVAD-BPN: Improving Baleen Whale Call Detection with Boundary Proposal Networks and Post-processing Optimisation (No. arXiv:2510.21280). arXiv. https://doi.org/10.48550/arXiv.2510.21280

@inproceedings{Geldenhuys2025WhaleVAD,
    author = "Geldenhuys, Christiaan and Tonitz, Günther and Niesler, Thomas",
    title = "Whale-VAD: Whale Vocalisation Activity Detection",
    booktitle = "Proceedings of the 10th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2025)",
    address = "Barcelona, Spain",
    month = "October",
    year = "2025",
    pages = "165--169",
    isbn = "978-84-09-77652-8",
    doi = "10.5281/zenodo.17251589"
}

@misc{Geldenhuys2025WhaleVADBPN,
      title={WhaleVAD-BPN: Improving Baleen Whale Call Detection with Boundary Proposal Networks and Post-processing Optimisation},
      author={Christiaan M. Geldenhuys and Günther Tonitz and Thomas R. Niesler},
      year={2025},
      eprint={2510.21280},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2510.21280},
}

Contributing

We welcome contributions to improve Whale-VAD! Please feel free to submit issues, fork the repository, and create pull requests.

Authors

  • Christiaan M. Geldenhuys ORCID

  • Günther Tonitz ORCID

  • Thomas R. Niesler ORCID

Acknowledgements

The authors gratefully acknowledge Telkom (South Africa) for their financial support, and the Stellenbosch Rhasatsha high performance computing (HPC) facility for the compute time provided to the research presented in this work.

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0) - a copyleft license that requires anyone who distributes the code or a derivative work to make the source available under the same terms. All code and model weights are provided as is.


Presented at the 10th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2025), Barcelona, Spain, October 2025.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Languages