Code for the paper: Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning.
Note
We are working to release an improved training and inference codebase. In this repository, you will only find the model implementation and training code and configs for our IWSLT 2025 submission.
The following list is an overall description of the main folders and scripts in the repository. We are not releasing scripts to download and prepare datasets locally. Feel free to reach out if you want to replicate our exact setup.
bash/: contains a bash script to schedule a training run using SLURM.config/: the bash runner requires two yaml configuration files, one to control the distributed training using Hugging Face's accelerate, one for the training parameters. These files can be found here.src/: contains the training utilities and scripts, as well as the modeling code (folder:speechlm). The model is implemented using Hugging Face transformers.
For inquiries, feel free to open an issue on this repository.
@inproceedings{attanasio-etal-2025-instituto,
title = "Instituto de Telecomunica{\c{c}}{\~o}es at {IWSLT} 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning",
author = "Attanasio, Giuseppe and
Sannigrahi, Sonal and
Peters, Ben and
Filipe Torres Martins, Andr{\'e}",
editor = "Salesky, Elizabeth and
Federico, Marcello and
Anastasopoulos, Antonis",
booktitle = "Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)",
month = jul,
year = "2025",
address = "Vienna, Austria (in-person and online)",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.iwslt-1.36/",
doi = "10.18653/v1/2025.iwslt-1.36",
pages = "347--353",
ISBN = "979-8-89176-272-5"
}This work was supported by the Portuguese Recovery and Resilience Plan through project C645008882-00000055 (Center for Responsible AI), by EU’s Horizon Europe Research and Innovation Actions (UTTER, contract 101070631), by the project DECOLLAGE (ERC-2022-CoG 101088763), and by FCT/MECI through national funds and EU funds under UID/50008: Instituto de Telecomunicações.