VARAN 🦎: Official Implematration of Variational Inference for Self-Supervised Speech Models Fine-Tuning on Downstream Tasks

How to run

This repository contains a source code for SER: Speech Emotion Recognition and ASR: Automatic Speech Recognition tasks.

To train SER model run: python scripts/train_ser.py <path_to_json_config>. Metrics for the validation and test sets will be computed at each evaluation step specified in the config.

For the ASR task, we do not share a training script, but we do share the model classes implementing the VARN layer aggregation method for CTC loss computation in the data2vec_for_sequence_classification.py and wavlm_for_sequence_classification.py scripts.

Available model configurations for SER:

Upstream model:
- data2vec-base
- data2vec-large
- wavlm-base
- wavlm-large
Fine-tuning strategy:
- full fine-tuning
- LoRA fine-tuning
Layer aggreagtion method:
- last layer
- weighted sum
- varan

Examples of the json configs can be found in configs directory.

Reproduciability

While we don't provide pre-trained model weights, this code contains everything needed to reproduce our approach for academic research purposes.

To reproduce our results:

We recommend running a hyperparameter sweep for each layer aggregation method and selecting the best configuration based on validation set performance.
The hyperparameters we used in our experiments are detailed in the paper.
Due to the environmental differences may affect results, so your final metrics might differ from those reported.

An example sweep configuration can be found in varan_sweep.yaml.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs/ser/ravdess		configs/ser/ravdess
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
varan_sweep.yaml		varan_sweep.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VARAN 🦎: Official Implematration of Variational Inference for Self-Supervised Speech Models Fine-Tuning on Downstream Tasks

How to run

Available model configurations for SER:

Reproduciability

To reproduce our results:

About

Uh oh!

Releases

Packages

Languages

License

deepvk/varan

Folders and files

Latest commit

History

Repository files navigation

VARAN 🦎: Official Implematration of Variational Inference for Self-Supervised Speech Models Fine-Tuning on Downstream Tasks

How to run

Available model configurations for SER:

Reproduciability

To reproduce our results:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages