Skip to content

bioacoustic-ai/bacpipe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

354 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to bacpipe (BioAcoustic Collection Pipeline)

Documentation Status

image by Nicole Allison

bacpipe makes using deep learning models for bioacoustics easy! Using bacpipe you can generate embeddings, classification predictions and clusters. All you need is your audio data and to customize the settings.

And the best part is, it comes with a GUI for you to explore the results.

bacpipe is also available on pip: pip install bacpipe

import bacpipe

# This will execute the whole pipeline
# if nothing is specified it will generate embeddings on
# a set of audio test data using the models birdnet and perch

bacpipe.play()

A more detailed description of the API can be found under API. Full documentation can be found at https://bacpipe.readthedocs.io.

There is a video tutorial available on youtube to install and run bacpipe.

📚 Table of Contents


How it works

This repository aims to streamline the generation and evaluation of embeddings using a large variety of bioacoustic models.

The below image shows a comparison of umap embeddings based on 15 different bioacoustic models. The models are being evaluated on a bird and frog dataset (more details in this conference paper).

bacpipe requires a dataset of audio files, runs them through a series of models, and generates embeddings. These embeddings can then be used visualized and evaluated for various tasks such as clustering or classification.

By default the embeddings will be generated for the models specified in the config.yaml file.

Currently these bioacoustic models are supported (more details below):

available_models : [
    "audiomae"
    "audioprotopnet"
    "avesecho_passt"
    "aves_especies"
    "bat"
    "beats"
    "birdaves_especies"
    "biolingual"
    "birdnet"
    "birdmae"
    "convnext_birdset"
    "hbdet"
    "insect66"
    "insect459"
    "mix2"
    "naturebeats"
    "perch_bird"
    "perch_v2"
    "protoclr"
    "rcl_fs_bsed"
    "surfperch"
    "google_whale"
    "vggish"
  ]

Once the embeddings are generated, 2d reduced embeddings will be created using the dimensionality reduction model specified in the config.yaml file. And these dimensionality reduction models are supported:

available_reduction_models: [
  "pca",
  "sparse_pca",
  "t_sne",
  "umap"
]

Furthermore, the embeddings can be evaluated using different metrics. The evaluation is done using the evaluate.py script, which takes the generated embeddings and computes various metrics such as clustering performance and classification performance. The evaluation results are saved in the bacpipe/results directory.

available_evaluation_tasks: [
  "classification",
  "clustering"
]

The repository also includes a panel dashboard for visualizing the generated embeddings. To enable the dashboard, simply set the dashboard variable to True in the settings.yaml file. The dashboard will automatically open in your browser (at http://localhost:8050) after running the run_dashboard.py script.

The pipeline is designed to be modular, so you can easily add or remove models as needed. The models are organized into pipelines, which are defined in the bacpipe/embedding_generation_pipelines/feature_extractors directory. If you want to add a different dimensionality reduction model, you do so by adding new pipeline to the bacpipe/embedding_generation_pipelines/dimensionality_reduction directory.

Using annotations for evaluation

If you have annotations for your dataset, you can use them to evaluate the generated embeddings. The labels will be used to compute the clustering and classification performance of the embeddings.

To use the annotations for evaluation, create a file called annotations.csv in the directory specified in the audio_dir variable in the config.yaml file. The file should contain the following columns:

audiofilename,start,end,label

Where audiofilename is the name of the audio file, start and end are the start and end times of the annotation in seconds, and label is the label of the annotation.

For reference see the example annotations file.

If this file exists, the evaluation script will automatically use the annotations to compute the clustering and classification performance of the embeddings. The labels will also be used to color the points in the dashboard visualization showing the embeddings.

API

bacpipe can be used as a package and installed from pip

pip install bacpipe

Use bacpipe immediately on the integrated test data

import bacpipe

# This will execute the whole pipeline
# if nothing is specified it will generate embeddings on
# a set of audio test data using the models birdnet and perch

bacpipe.play()

Modify configurations and settings as attributs

To modify configurations and settings, you can simply access them as attributes. To see available settings and configs run the following commands

bacpipe.config
bacpipe.settings
# you can also check the bacpipe/config.yaml and bacpipe/settings.yaml
# files here in the repository to see all the available settings and
# read their respective description

Modify audio source path, models, device

If you're on a Windows machine, make sure to add a r before the path like r'path\to\audio' otherwise the path will likely cause problems due to the backslashes.

# to modify the audio data path for example, do
bacpipe.config.audio_dir = '/path/to/your/audio/dir'

# to modify the models you want to run, do
bacpipe.config.models = ['birdnet', 'birdmae', 'naturebeats']
# if you do not have the checkpoint yet, it will be automatically 
# downloaded and stored locally

bacpipe.settings.device = 'cuda' 
# bacpipe uses multithreading which speeds up model inference if 
# run on a machine supporting cuda

# then run with your settings.
# By default the save logs is True
bacpipe.play(save_logs=True)
# That way bacpipe will generate log files of the outputs and also save your
# config and settings files, which can be helpful in retrospect to remember
# all the settings you chose for a run. 

Use bacpipe in an existing pipeline

If you just want to run models and get embeddings returned without saving them and don't want the dashboard and all of that, define an embedder object and pass it the model name and the settings you modified.

em = bacpipe.Embedder('perch_bird', **vars(bacpipe.settings)) 
# the vars part is important!

audio_file = '/path/to/all/the/audio/file'
embeddings = em.get_embeddings_from_model(audio_file)

# if the model has a built in classifier, like birdnet, the classification
# score are also saved. You can check if there is a classifier included by 
# checking 
em.model.bool_classifier

# After the generating of embeddings above, you will then be able to access 
# the class predictions using
em.model.classifier_outputs

Produce embeddings for multiple models in your own pipeline

If you want to produce embeddings for multiple models, bacpipe will always store them to keep your memory from overfilling. Still you can use the package to easily access the embeddings and all the metadata

loader = bacpipe.model_specific_embedding_creation(
  **vars(bacpipe.config), **vars(bacpipe.settings)
  )
# this call will initiate the embedding generation process, it will check if embeddings
# already exist for the combination of each model and the dataset and if so it will
# be ready to load them. The loader keys will be the model name and the values will
# be the loader objects for each model. Each object contains all the information
# on the generated embeddings. To name access them:
loader['birdnet'].embedding_dict() 
# this will give you a dictionary with the keys corresponding to embedding files
# and the values corresponding to the embeddings as numpy arrays

loader['birdnet'].metadata_dict
# This will give you a dictionary overview of:
# - where the audio data came from,
# - where the embeddings were saved
# - all the audio files, 
# - the embedding size of the model, 
# - the audio file lengths,
# - the number of embeddings for each audio files
# - the sample rate
# - the number of samples per window
# - and the total length of the processed dataset in seconds
# Thic dictionary is also saved as a yaml file in the directory of the embeddings

Dashboard visualization

bacpipe includes a dashboard visualization by default allowing you to easily explore the generated embeddings

Once embeddings are generated, they can be easily visualized using a dashboard (built using panel) by simply setting the dashboard setting in the config.yaml file to True.

Below you can see a gif showing the basic usage of the dashboard.

The dashboard has 3 main sections:

  1. Single model
  2. Two models
  3. All models

In the single model section, you can select a model and visualize the embeddings generated by that model. The embeddings can be colored by :

  • metadata extracted from the files (date and time information, and file and parent directory)
  • the labels specified in the annotations.csv file
  • the cluster labels generated by the clustering algorithm (kmeans)

In the dashboard sidebar you can select the model, by which to label the embeddings, whether to remove noise, and the type of classification task to show the results for.

The noise removal is done by removing the embeddings that do not correspond to annotated sections of the audio files. This is useful if you want to focus on the annotated sections of the audio files and disregard the rest of the data.

The visualizations can be saved as png files by clicking the save button in the bottom right corner of the plot.

Try it out and (please) feel free to give feedback and ask questions (or suggestions for improvements) - or in case something does not work raise issues.


Installation

Install prerequisites

Install python 3.11, uv and git (click to expand)

Install Python 3.11 and git on your local computer

For Windows:

For Linux

  • sudo add-apt-repository ppa:deadsnakes/ppa
  • sudo apt install python3.11
  • sudo apt install git

For Mac

  • (install homebrew: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  • install git: brew install git

Create a virtual environment for this project

Virtual environments are very important. They ensure that specific libraries that are needed for one project don't get in the way of libraries you need for another project.

Create a virtual environment and install the packages with one command:

  • uv sync

If you prefer having control over the environment name, use these commands:

Create your virtual environment (all systems):

  • uv venv --python 3.11 env_bacpipe

For Windows, Activate your environment:

  • source env_bacpipe/Scripts/activate

For Linux/Mac, Activate your environment:

  • source env_bacpipe/bin/activate

Install the project dependencies (all systems):

  • uv pip install -r pyproject.toml

Install uv (recommended) or poetry

It is recommended to use python 3.11 for this repository, as some of the models require it.

For speed and stability it is recommended to use uv. To install uv use the following command (you can also do it all without uv then, just leave the uv part away, as all the commands are also pip commands):

pip install uv

(for windows use $HOME/AppData/Local/Programs/Python/Python311/python.exe -m pip install uv)

If you prefer to use poetry, you can install it using:

pipx install poetry

Create a virtual environment

python3.11 -m uv venv .env_bacpipe

(for windows use $HOME/AppData/Local/Programs/Python/Python311/python.exe -m uv venv .env_bacpipe)

(alternatively for poetry use poetry env use 3.11)

activate the environment

source .env_bacpipe/bin/activate (for windows use source .env_bacpipe\Scripts\activate)

Clone the git repository

For Windows:

  • move to a folder of your choice (choose wisely - something like Documents is always a good starting point), then right click and Open Git Bash here
  • install uv: "$HOME/AppData/Local/Programs/Python/Python311/python" -m pip install uv

For Linux / MAC:

  • open a terminal (console) in the folder of your choice
  • install uv: /usr/bin/python/Python311/python -m pip install uv
  • (if Mac users get an error locate python with which python3 and use that path instead followed by -m pip install uv)

For all systems:

  • run git clone https://github.com/bioacoustic-ai/bacpipe

Install the dependencies once the prerequisites are satisfied.

uv pip install -r pyproject.toml

  • this will automatically install requirements based on your os, so windows should also work fine. However, gpu support is not available on windows

For poetry:

poetry lock

poetry install

If you do not have admin rights and encounter a permission denied error when using pip install, use python -m pip install ... instead.

Install gpu support for tensorflow

Because of the requirements of torch==2.6 the cuda versions have to be installed corresponding to what pytorch supports. However, I have tested that you are able to install different cuda dependencies to also support tensorflow gpus once the environment is set up. Once you have installed the requirements. Install the following dependencies using uv pip install -r requirements_tf_gpu.txt.

nvidia-cublas-cu12==12.5.2.13
nvidia-cuda-cccl-cu12==12.5.39.post1
nvidia-cuda-cupti-cu12==12.5.39
nvidia-cuda-cuxxfilt-cu12==12.5.39
nvidia-cuda-nvcc-cu12==12.5.40
nvidia-cuda-nvrtc-cu12==12.5.40
nvidia-cuda-opencl-cu12==12.5.39
nvidia-cuda-profiler-api-cu12==12.5.39
nvidia-cuda-runtime-cu12==12.5.39
nvidia-cuda-sanitizer-api-cu12==12.5.39
nvidia-cudnn-cu12==9.3.0.75
nvidia-cufft-cu12==11.2.3.18
nvidia-curand-cu12==10.3.6.39
nvidia-cusolver-cu12==11.6.2.40
nvidia-cusparse-cu12==12.4.1.24
nvidia-cusparselt-cu12==0.6.2
nvidia-nccl-cu12==2.21.5
nvidia-npp-cu12==12.3.0.116
nvidia-nvfatbin-cu12==12.5.39
nvidia-nvjitlink-cu12==12.5.40
nvidia-nvjpeg-cu12==12.3.2.38
nvidia-nvml-dev-cu12==12.5.39
nvidia-nvtx-cu12==12.5.39

Model checkpoints are downloaded automatically.

Model checkpoints will be downloaded automatically. Once you run bacpipe.play(), it will automatically download models that were included but are not yet available locally. Models are downloaded from this huggingface repo.

Test the installation was successful

By doing so you will also ensure that the directory structure for the model checkpoints will be created.

pytest -v --disable-warnings bacpipe/tests/test_embedding_creation.py

The tests could take a while, so to run a small test, you can also pass the model you would like to test:

pytest -v --disable-warnings bacpipe/tests/test_embedding_creation.py --models=birdnet,perch

(keep in mind you have to have the checkpoints locally for the models that require it)

In case of a permission denied error, run python -m pytest -v --disable-warnings bacpipe/tests/test_embedding_creation.py

If everything passes then you've successfully installed bacpipe and can now proceed to use it.

Usage

Configurations and settings

To see the capabilities of bacpipe, go ahead and run the run_pipeline.py script. This will run the pipeline with the default settings and configurations on a small set of test data.

To use bacpipe on your own data, you will need to modify the configuration files.

The only two files that need to be modified are the config.yaml and settings.yaml files. The config.yaml is used for the standard configurations:

  • path to audio files
  • models to run
  • dimensionality reduction model
  • evaluation tasks
  • whether to run the dashboard or not

The settings.yaml file is used for more advanced configurations and does not need to be modified unless you have specific preferences. It includes settings such as to run on a cpu or a cuda (gpu) (by default cpu), the paths where results are saved, configurations for the evaluation tasks and more.

Modify the config.yaml file in the root directory to specify the path to your dataset. Define what models to run by specifying the strings in the models list (copy and paste as needed, I usually just comment the model's I don't want to run).

If you have already computed embeddings on the dataset specified in audio_data, and you want to do the dimensionality reduction and evaluation for the models you have already run, you can set the already_computed variable to True. This will only select the models that have already been computed.

In either case if you have already computed embeddings with a model, bacpipe will skip the model and use the already computed embeddings (if they are still located in the same directory). Even if overwrite is set to True, bacpipe will not overwrite the embeddings if they already exist. It will recompute clusterings and label generation.

Running the pipeline

Once the configuration is complete, execute the run_pipeline.py file (make sure the environment is activated) python run_pipeline.py

While the scripts are executed, directories will be created corresponding to the main_results_dir setting. Embeddings will be saved in main_results_dir/YOUR_DATASET/embeddings (see here for more info) and if selected, reduced dimensionality embeddings will be saved in main_results_dir/evaluation/dim_reduced_embeddings (see here for more info).

Model selection

Select the models you want to run in the config.yaml file. The models are specified in this ReadMe and in the test_file. You can select the models you want to run by adding them to the models list in the config.yaml file.

Dimensionality reduction

Different dimensionality reduction models can be selected in the config.yaml file. The available models are specified in the section Dimensionality reduction models. Insert the name of the selected model in the dim_reduction_model variable in the config.yaml file. The default is umap, but you can also select pca, sparse_pca or t_sne.

Dashboard

The dashboard is a panel application that allows you to visualize the generated embeddings. To enable the dashboard, set the dashboard variable in the config.yaml file to True. The dashboard will automatically open in your browser (at http://localhost:8050) after running the run_dashboard.py script.

Evaluation

You can use bacpipe to evaluate the generated embeddings using different metrics. To evaluate the embeddings, you need annotations for your dataset. The annotations should be in a file called annotations.csv in the directory specified in the audio_dir variable in the config.yaml file or the results directory of your dataset main_results_dir/YOUR_DATASET. The file should contain the following columns:

audiofilename,start,end,label:species

Where audiofilename is the name of the audio file, start and end are the start and end times of the annotation in seconds, and label is the label of the annotation.

Using the settings attribute only_embed_annotations, you can also decide to only generate embedding corresponding to your annotations. In that case bacpipe will take each annotation and create embeddings for each selected model from only those segments. If segments are shorter than the model input length, the segments will be padded. If they are longer, they will produce several embeddings.

species is a placeholder here and can be replaced with any label description. So if you have labelled call types, change it to label:call_type. But it's important that there are no spaces and that it contains label:. By doing this you will be able to visualize your data based on all of these label columns.

The labels can then be used to perform clustering and classification evaluation. This can be done only in regard to one label, so specify the main label column in the label_column variable in settings.yaml. This defaults to species. Only labels that exceed the min_label_occurrences value will be used. This is to make sure you have enough data to train linear classifiers and do meaningful evaluations. If you have enough labeled data, feel free to increase this.

See the file annotations.csv for an example of how the annotations file should look like.

Once the annotations file is created, add either classification or clustering (or both) to the evaluation_task variable in the config.yaml file (use double quotes: "classification" or "clustering"). You can run the evaluation script using normal python run_pipeline.py command. The evaluation script will automatically use the annotations to compute the clustering and classification performance of the embeddings. The results will be saved in the bacpipe/results/YOUR_DATASET/evaluation directory.

If you selected classification, a linear classifier will be trained and saved in the classification subdirectory of the evaluation folder. This .pt file can be used to generate class predictions with a model that wasn't originally trained on these classes. A tutorial will be available shortly explaining this in more detail. The .pt file can be used in the repository acodet to generate class predictions with the combination of a feature extractor and the trained linear classifier.

Models with classifiers

Models that already contain classification heads, are the following:

  • Perch_v2
  • BirdNET
  • Perch_bird
  • SurfPerch
  • AudioProtoPNet
  • ConvNeXT_birdset
  • google_whale

With all of these models, you only need to set run_pretrained_classifier to True and then the model will save the classification outputs in the classification/original_classifier_outputs folder. Only predictions exceeding the classifier_threshold value will be saved. A csv file in the shape of the annotations.csv file is also saved corresponding to the class predictions. The dashboard will also contain an extra label_by option default_classifier.

Generated Files

When processing bacpipe generates a number of files. It will firstly create the main_results_dir specified in settings.yaml and within that it will create a folder named like the dataset selected in audio_dir in config.yaml. Within this directory, bacpipe will create 3 directories: embed_parent_dir, dim_reduc_parent_dir and evaluations_dir.

The embed_parent_dir will contain the model-specific folders with the timestamps when they were processed. The dim_reduc_parent_dir contains the model-specific and dimensionality reduction-specific folder again with a timestamp when they were processed. And the evaluations_dir contains folders for each model that has been processed.

Embedding folders

By default the naming conventions for the embedding directories is:

_year-month-day_hour-minute___modelname_datasetname

The metadata.yaml file that will be created in each of the embedding folders has the following structure:

Inside this directory you will find:

  • metadata.yml, which contains metadata in the shape of a dictionary with the following keys:
    • audio_dir: data_directory
    • embed_dir: path_to_embedding_files
    • embedding_size: dimension_of_embeddings
    • files: dictionary_with_lists_containing_per_file_information
      • audio_files: name_of_audio_files
      • file_lengths (s): file_length_in_seconds
    • model_name: name_of_model
    • sample_rate (Hz): sample_rate
    • segment_length (samples): length_of_input_segment_in_samples
    • total_dataset_length (s): total_length_of_dataset_in_seconds
  • either the embeddings files (ending on .npy) or directories corresponding to subdirectories in the dataset folder

It is important that the name of this directory remains unchanged, so that it can be found automatically.

Dimensionality reduced embedding folders

By default the naming conventions for the embedding directories is:

_year-month-day_hour-minute___DimReductionModelName-DatasetName-ModelName

Inside this directory you will find:

  • metadata.yml, which contains metadata in the shape of a dictionary with the following keys (all of the metadata from the embeddings is also repeated):
    • files: dictionary_with_lists_containing_per_file_information
      • embedding_dimensions: tuple_of_number_of_embeddings_by_embedding_size
      • embedding_files: name_of_embedding_files
      • nr_embeds_per_file: number_of_embeddings_per_file
    • nr_embeds_total: total_number_of_embeddings
  • one .json file containing the reduced embeddings of the entire dataset
  • a plot, visiualizing the reduced embeddings in 2d

It is important that the name of this directory remains unchanged, so that it can be found automatically.

Evaluation folders

Within the evaluations_dir folder, you will find the following folders: classification, clustering, distances, labels and plots. These folders will be filled with results if the corresponding evaluation tasks are selected. distanctes is currently not supported and therefore only a place-holder. label will contain a .npy file containing the auto-generated labels from the metadata and if available ground_truth.

Pretrained classifier annotations

If the model has a pretrained classifier and run_pretrained_classifier is True, bacpipe will create a original_classifier_output folder and a default_classifier_annotations.csv file, corresponding to annotations from the pretrained classifier. The folder will contain .json files for each audio file with the species and in which time bin they occurred with what certainty. The default_classifier_annotations.csv file contains annotations in the same style as the annotations.csv file.

Example result files structure

Generated from running with the test_data (timestamps will change) with only directories shown.

This is the source audio data structure:

└── audio
    ├── FewShot
    └── UrbanSoundscape

This is the resulting folder structure:

.
├── dim_reduced_embeddings
│   ├── 2025-09-09_03-08___umap-test_data-birdnet
│   └── 2025-09-09_03-09___umap-test_data-perch_bird
├── embeddings
│   ├── 2025-09-09_03-08___birdnet-test_data
│   │   └── audio
│   │       ├── FewShot
│   │       └── UrbanSoundscape
│   └── 2025-09-09_03-08___perch_bird-test_data
│       └── audio
│           ├── FewShot
│           └── UrbanSoundscape
└── evaluations
    ├── birdnet
    │   ├── classification
    │   │   └── original_classifier_outputs
    │   │       └── audio
    │   │           ├── FewShot
    │   │           └── UrbanSoundscape
    │   ├── clustering
    │   ├── distances
    │   ├── labels
    │   └── plots
    ├── overview
    └── perch_bird
        ├── classification
        │   └── original_classifier_outputs
        │       └── audio
        │           ├── FewShot
        │           └── UrbanSoundscape
        ├── clustering
        ├── distances
        ├── labels
        └── plots

Available models

The models all have their model specific code to ensure inference runs smoothly. More info on the models and their pipelines can be found here.

Models currently include:

Name ref paper ref code sampling rate input length embedding dimension
AudioMAE paper code 16 kHz 10 s 768
AudioProtoPNet paper code 32 kHz 5 s 1024
AvesEcho_PASST paper code 32 kHz 3 s 768
AVES_ESpecies paper code 16 kHz 1 s 768
Bat paper code 22.05 kHz ~1 s 64
BEATs paper code 16 kHz 10 s 768
BioLingual paper code 48 kHz 10 s 512
BirdAVES_ESpecies paper code 16 kHz 1 s 1024
BirdMAE paper code 32 kHz 10 s 1280
BirdNET paper code 48 kHz 3 s 1024
ConveNeXT_BirdSet paper code 32 kHz 5 s 1024
Google_Whale paper code 24 kHz 5 s 1280
HumpbackNET paper code 2 kHz 3.9124 s 2048
Insect66NET paper code 44.1 kHz 5.5 s 1280
Insect459NET paper pending 44.1 kHz 5.5 s 1280
Mix2 paper code 16 kHz 3 s 960
NatureBEATs paper code 16 kHz 10 s 768
Perch_Bird paper code 32 kHz 5 s 1280
Perch_V2 paper code 32 kHz 5 s 1536
ProtoCLR paper code 16 kHz 6 s 384
RCL_FS_BSED paper code 22.05 kHz 0.2 s 2048
SurfPerch paper code 32 kHz 5 s 1280
VGGish paper code 16 kHz 0.96 s 128
Click to see more details on the models
Name paper code training CNN/Trafo architecture checkpoint link
AudioMAE paper code ssl + ft trafo ViT weights
AudioProtoPNet paper code sup l CNN ConvNext included
AvesEcho_PaSST paper code sup l trafo PaSST weights
AVES_ESpecies paper code ssl trafo HuBERT weights
Bat paper code sup l trafo CNN + trafo weights
BEATs paper code ssl trafo ViT weights
BioLingual paper code ssl trafo CLAP included
BirdAVES_ESpecies paper code ssl trafo HuBERT weights
BirdMAE paper code ssl trafo ViT included
BirdNET paper code sup l CNN EffNetB0 weights
ConvNeXT_BirdSet paper code sup l CNN ConvNext included
Google_Whale paper code sup l CNN EffNetb0 included
HumpbackNET paper code sup l CNN ResNet50 weights
Insect66NET paper code sup l CNN EffNetv2s weights
Insect459NET paper pending sup l CNN EffNetv2s pending
Mix2 paper code sup l CNN MobNetv3 release pending
NatureBEATs paper code ssl trafo BEATs weights
Perch_Bird paper code sup l CNN EffNetb0 included
Perch_V2 paper code sup l CNN EfficientNetB3 included
ProtoCLR paper code sup cl trafo CvT-13 weights
RCL_FS_BSED paper code sup cl CNN ResNet9 weights
SurfPerch paper code sup l CNN EffNetb0 included
VGGish paper code sup l CNN VGG weights

Brief description of models

All information is extracted from the respective repositories and manuscripts. Please refer to them for more details

AudioMAE

  • spectrogram input
  • self-supervised pretrained model, fine-tuned
  • vision transformer
  • trained on general audio

AudioMAE from the facebook research group is a vision transformer pretrained on AudioSet-2M data and fine-tuned on AudioSet-20K.

AudioProtoPNet

  • spectrogram input
  • supervised learning, trained using asymmetric loss
  • ConvNext architecture as feature extractor
  • trained on the xeno-canto large section of BirdSet

This CNN is trained in two phases. The main contribution of this model is its interpretability. It learned prototypes during its second training phase which can be used during inference time to visualize sections of the spectrogram that were most important for classification. It also reaches competitive performance on bird classification tasks. The (included) prototype-based classifier can distinguish 9736 classes.

AvesEcho_PaSST

  • transformer
  • supervised pretrained model, fine-tuned
  • pretrained on general audio and bird song data

AvesEcho_PaSST is a vision transformer trained on AudioSet and (deep) fine-tuned on xeno-canto. The model is based on the PaSST framework.

AVES_ESpecies

  • transformer
  • self-supervised pretrained model
  • trained on general audio

AVES_ESpecies is short for Animal Vocalization Encoder based on Self-Supervision by the Earth Species Project. The model is based on the HuBERT-base architecture. The model is pretrained on unannotated audio datasets AudioSet-20K, FSD50K and the animal sounds from AudioSet and VGGSound.

Bat

  • trafo
  • supervised learning
  • bat data (Skiba et al. 2003))

Bat is a hybrid CNN+transformer model used for bat classification.

BEATs

  • trafo
  • self-supervised learning
  • trained on AudioSet

BEATs is microsofts SotA audio model based on audio pre-training with acoustic tokenizers. The model reaches competitive results with many bioacosutic models in benchmarks for linear and attentive probing, and is therefore also included in bacpipe as a general audio baseline model.

BioLingual

  • transformer
  • spectrogram input
  • contrastive-learning
  • self-supervised pretrained model
  • trained on animal sound data (primarily bird song)

BioLingual is a language-audio model trained on captioning bioacoustic datasets inlcuding xeno-canto and iNaturalist. The model architecture is based on the CLAP model architecture.

BirdAVES_ESpecies

  • transformer
  • self-supervised pretrained model
  • trained on general audio and bird song data

BirdAVES_ESpecies is short for Bird Animal Vocalization Encoder based on Self-Supervision by the Earth Species Project. The model is based on the HuBERT-large architecture. The model is pretrained on unannotated audio datasets AudioSet-20K, FSD50K and the animal sounds from AudioSet and VGGSound as well as bird vocalizations from xeno-canto.

BirdMAE

  • trafo (ViT)
  • self-supervised model
  • trained on XCM

BirdMAE is a masked autoencoder inspired by meta's AudioMAE, however the model was heavily adapted for the bioacoustic domain. The model was trained on the xeno-canto M dataset (1.7 million samples) from BirdSet and evaluated on various soundscape datasets, where it outperformed all competing models (including SotA bioacoustic models).

BirdNET

  • CNN
  • supervised training model
  • trained on bird song data

BirdNET (v2.4) is based on a EfficientNET(b0) architecture. The model is trained on a large amount of bird vocalizations from the xeno-canto database alongside other bird song databses.

ConvNeXT_BirdSet

  • CNN
  • supervised learning
  • trained on BirdSet

The ConvNeXT_birdset model is a ConvNeXT CNN trained on the BirdSet dataset (which consists of large portions of the xeno-canto database and uses a varierty of multilabel soundscape recordings for evaluation.)

Google_Whale

  • CNN
  • supervised training model
  • trained on 7 whale species

Google_Whale (multispecies_whale) is a EFficientNet B0 model trained on whale vocalizations and other marine sounds.

HumpbackNET

  • CNN
  • supervised training model
  • trained on humpback whale song

HumpbackNET is a binary classifier based on a ResNet-50 model trained on humpback whale data from different parts in the North Atlantic.

Insect66NET

  • CNN
  • supervised training model
  • trained on insect sounds

InsectNET66 is a EfficientNet v2 s model trained on the Insect66 dataset including sounds of grasshoppers, crickets, cicadas developed by the winning team of the Capgemini Global Data Science Challenge 2023.

Insect459NET

  • CNN
  • supervised training model
  • trained on insect sounds

InsectNET459 is a EfficientNet v2 s model trained on the Insect459 dataset (publication pending).

Mix2

  • CNN
  • supervised training model
  • trained on frog sounds

Mix2 is a MobileNet v3 model trained on the AnuranSet which includes sounds of 42 different species of frogs from different regions in Brazil. The model was trained using a mixture of Mixup augmentations to handle the class imbalance of the data.

NatureBEATs

  • trafo
  • self-supervised training model
  • trained on diverse set of bioacoustics, general sound, music, human speech

NatureLM-Audio is a very ambitious foundational model specifically for bioacoustics. It uses Microsoft's BEATs backbone as an audio encoder along with Meta's Llama-3.1-8B large language model capabilities. In the implementation used here in bacpipe, only the support for BEATs audio-encoder with NatureLM-Audio's weights, referred to here as NatureBEATs, is provided.

RCL_FS_BSED

  • CNN
  • supervised contrastive learning
  • trained on dcase 2023 task 5 dataset link

RCL_FS_BSED stands for Regularized Contrastive Learning for Few-shot Bioacoustic Sound Event Detection and features a model based on a ResNet model. The model was originally created for the DCASE bioacoustic few shot challenge (task 5) and later improved.

ProtoCLR

  • transformer
  • supervised contrastive learning
  • trained on bird song data

ProtoCLR stands for Prototypical Contrastive Learning for robust representation learning. The architecture is a CvT-13 (Convolutional vision transformer) with 20M parameters. ProtoCLR has been validated on transfer learning tasks for bird sound classification, showing strong domain-invariance in few-shot scenarios. The model was trained on the xeno-canto dataset.

Perch_Bird

  • CNN
  • supervised training model
  • trained on bird song data

Perch_Bird is a EFficientNet B1 model trained on the entire Xeno-canto database.

Perch_V2

  • CNN
  • supervised learning
  • trained on birds, amphibians, insects and mammals (xeno-canto, iNaturalis, Tierstimmenarchiv)

Perch V2 or Perch 2.0 is the updated version of the Perch model from bioacousticians at Google. The model is a EfficientNetB3, trained on a very large database of various species. The classifier is able to distinguish 14795 different species.

SurfPerch

  • CNN
  • supervised training model
  • trained on bird song, fine-tuned on tropical reef data

Perch is a EFficientNet B1 model trained on the entire Xeno-canto database and fine tuned on coral reef and unrelated sounds.

VGGISH

  • CNN
  • supervised training model
  • trained on general audio

VGGish is a model based on the VGG architecture. The model is trained on audio from youtube videos (YouTube-8M)

Dimensionality reduction models

To evaluate the generated embeddings a number of dimensionality reduction models are included in this repository:

name reference code reference linear
UMAP paper code No
t-SNE paper code No
PCA paper code Yes
Sparse_PCA paper code Yes

Add a new model

To add a new model, simply add a pipeline with the name of your model. Make sure your model follows the following criteria:

  • define the model specific sampling rate
  • define the model specific input segment length
  • define a class called "Model" which inherits the ModelBaseClass from bacpipe.utils
  • define the init, preproc, and call methods so that the model can be called
  • if necessary save the checkpoint in the bacpipe.model_checkpoints dir with the name corresponding to the name of the model
  • if you need to import code where your specific model class is defined, create a directory in bacpipe.model_specific_utils corresponding to your model name "newmodel" and add all the necessary code in there

Here is an example:

import torch
from bacpipe.model_specific_utils.newmodel.module import MyClass

SAMPLE_RATE = 12345
LENGTH_IN_SAMPLES = int(10 * SAMPLE_RATE)

from ..utils import ModelBaseClass

class Model(ModelBaseClass):
    def __init__(self, **kwargs):
        super().__init__(sr=SAMPLE_RATE, segment_length=LENGTH_IN_SAMPLES, **kwargs)
        self.model = MyClass()
        state_dict = torch.load(
            self.model_base_path + "/newmodel/checkpoint_path.pth",
            weights_only=True,
        )
        self.model.load_state_dict(state_dict)

    def preprocess(self, audio): # audio is a torch.tensor object
        # insert your preprocessing steps
        return processed_audio

    def __call__(self, x):
        # by default the model will be called with .eval() mode
        return self.model(x)

Most of the models are based on pytorch. For tensorflow models, see birdnet, hbdet or vggish.


Contribute

This repository is intended to be a collaborative project for people working in the field of bioacoustics. If you think there is some improvement that could be useful, please raise an issues, submit a PR or get in touch.

There are two main intentions for this repository that should always be considered when contributing:

  1. Only add new requirements if truly necessary

Given the large number of different models, there are already a lot of requirements. To ensure that the repository is stable, and installation errors are kept minimal, please only add code with new requirements if truly necessary.

  2. The main purpose of bacpipe is quickly generating embeddings from models

There should always be a baseline minimal use case, where embeddings are created from different feature extractors and everything else is an add-on.

Known issues

Perch_v2 is only supported on linux machines. see this issue.

bacpipe is being updated regularly. To make sure you're always up to date with the latest fixes run git pull regularly. If you have local changes (like changes in the config or settings files) you don't want to lose run the following:

git stash
git pull
git stash apply

That way it will stash your changes, update bacpipe for you and then apply your changes again, so you can continue where you left off.

Given that this repository compiles a large number of very different deep learning models with different requirements, some issues have been noted.

Please raise issues if there are questions or bugs.

Previous versions of bacpipe included models like animal2vec, but the requirements conflicts led me to remove them. In the future I hope there will be an updated version of those models and then they will be included again.

Citation

A lot of work has gone into creating these bioacoustic models, both by data collectors and by machine learning practitioners, please cite the authors of the respective models (all models are referenced in the table above).

This work is first described in a conference paper. If you use bacpipe for your research, please include the following reference:

@misc{kather2025clusteringnovelclassrecognition,
      title={Clustering and novel class recognition: evaluating bioacoustic deep learning feature extractors}, 
      author={Vincent S. Kather and Burooj Ghani and Dan Stowell},
      year={2025},
      eprint={2504.06710},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2504.06710}, 
}

Newsletter and Q&A sessions

Reading from the traffic on the repository, there seems to be an interest in bacpipe. I have set up a newsletter under this link: https://buttondown.com/vskode. Once more than 30 people have signed up for the newsletter, I will schedule a Q&A session and post the link in the newsletter. Hopefully I can then help answer questions and address issues that people are running into.

About

BioAcoustic Collection Pipeline

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 5

Languages