ADAPTIA-MT is a suite of resources for the adaptation and evaluation of Machine Translation (MT) systems in industrial domains. The suite includes specialised terminology and parallel validation corpora for Basque-Spanish translation, manually crafted and validated in four sectors: automotive, energy, railways and machine tool.
The corpus was created during the development of project ADAPT-IA (KK-2023/00035), which received funding from the Department of Economic Development and Competitiveness of the Basque Government (Spri Group), within the Elkartek I programme (2023-2024).
The suite includes two main Basque-Spanish datasets, in two separate folders:
- ADAPTIA-MT-TERM: four termbases in TBX format, one for each industrial domain.
- ADAPTIA-MT-TEST: professionally translated text from the selected industrial domains, aligned at both document and sentence levels. At the sentence level, the corpus is provided both with and without term annotation.
The following researchers were involved in the ADAPTIA-MT dataset creation process:
- Thierry Etchegoyhen (Vicomtech)
- Harritxu Gete (Vicomtech)
- Begoña Arrate (UZEI)
- Joxean Zapirain (UZEI)
- Victor Ruiz (Vicomtech)
ADAPTIA-MT is distributed under the following license:
If you use this dataset in your work, please cite the following paper (to appear):
@inproceedings{etchegoyhen-et-al2025adaptiamt,
title = "Machine Translation in Industrial Domains: Resources and Evaluations",
author = " Etchegoyhen, Thierry and Gete, Harritxu and Arrate, Begoña and Zapirain, Joxean and Ruiz, Victor",
booktitle = "Proceedings of SEPLN 2025",
year = "2025",
address = "Zaragoza, Spain",
}