GitHub - deep-spin/it-iwslt-2025: Code for the paper "Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning"

Code for the paper: Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning.

Note

We are working to release an improved training and inference codebase. In this repository, you will only find the model implementation and training code and configs for our IWSLT 2025 submission.

Project Structure

The following list is an overall description of the main folders and scripts in the repository. We are not releasing scripts to download and prepare datasets locally. Feel free to reach out if you want to replicate our exact setup.

bash/: contains a bash script to schedule a training run using SLURM.
config/: the bash runner requires two yaml configuration files, one to control the distributed training using Hugging Face's accelerate, one for the training parameters. These files can be found here.
src/: contains the training utilities and scripts, as well as the modeling code (folder: speechlm). The model is implemented using Hugging Face transformers.

Point of Contact

For inquiries, feel free to open an issue on this repository.

Citation

@inproceedings{attanasio-etal-2025-instituto,
    title = "Instituto de Telecomunica{\c{c}}{\~o}es at {IWSLT} 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning",
    author = "Attanasio, Giuseppe  and
      Sannigrahi, Sonal  and
      Peters, Ben  and
      Filipe Torres Martins, Andr{\'e}",
    editor = "Salesky, Elizabeth  and
      Federico, Marcello  and
      Anastasopoulos, Antonis",
    booktitle = "Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria (in-person and online)",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.iwslt-1.36/",
    doi = "10.18653/v1/2025.iwslt-1.36",
    pages = "347--353",
    ISBN = "979-8-89176-272-5"
}

Acknowledgments

This work was supported by the Portuguese Recovery and Resilience Plan through project C645008882-00000055 (Center for Responsible AI), by EU’s Horizon Europe Research and Innovation Actions (UTTER, contract 101070631), by the project DECOLLAGE (ERC-2022-CoG 101088763), and by FCT/MECI through national funds and EU funds under UID/50008: Instituto de Telecomunicações.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bash		bash
config		config
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Structure

Point of Contact

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

deep-spin/it-iwslt-2025

Folders and files

Latest commit

History

Repository files navigation

Project Structure

Point of Contact

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages