The Multilingual Lexical BATS datasets comprises lexical semantic relations in 15 natural languages listed in the table below.
The dataset folder contains a folder "all_languages" with ID- and English-aligned columns for each contained languages. Additionally, MultiLexBATS is also provided as individual language files with one folder and all relations files for each language in the dataset folder.
All scripts used in this experiments, e.g. for running statistics on the dataset (folder "stats") or querying generative pre-trained transformers i.e., BLOOM via the HuggingFace Interface API, are provided in the "scripts" folder.
For the languages that correspond to MATS languages, we utilised the same templates as in MATS. For all other languages, first language speakers created templates. For languages where there is not a direct equivalence to the English template, first language speakers proposed several templates that we tested. Please consult the final paper on which templates performed best in the experiments.
| Language | Prompt |
| EN | ``<a>'' is to ``<b>'' as ``<c>'' is to ``<d>''. |
| AL | ``<a>'' është për ``<b>'' ashtu si ``<c>'' për ``<d>''. |
| BM | ``<a>'' ye ``<b>'' ye i n’a fɔ ``<c>'' ye ``<d>'' ye |
| DE | ``<a>'' ist so zu ``<b>'' wie ``<c>'' zu <d> ist. |
| EL | το ``<a>'' είναι προς το ``<b>'' ό,τι το ``<c>'' προς το ``<d>''. |
| ES | ``<a>'' es a``<b>''como ``<c>'' es a ``<d>''. |
| FR | ``<a>'' est à ``<b>'' ce que ``<c>'' est à ``<d>''. |
| HE | ``<a>'' ל ``<b>'' כ ``<c>'' ל ``<d>'' |
| HR1 | ``<a>'' je za ``<b>'' kao što je ``<c>'' za ``<d>''. |
| HR2 | Riječ ``<a>'' je riječi ``<b>'' jednako što je riječ ``<c>'' riječi ``<d>''. |
| HR3 | Odnos između riječi ``<a>'' i ``<b>'' jednak je odnosu između riječi ``<c>'' i ``<d>''. |
| IT | ``<a>'' sta a ``<b>'' come ``<c>'' sta a ``<d>''. |
| LT | ``<a>'' yra ``<b>'' taip, kaip ``<c>'' yra ``<d>''. |
| MK1 | ``<a>'' е за ``<b>'' исто што и ``<c>'' за ``<d>''. |
| MK2 | Зборот ``<a>'' за зборот ``<b>'' е исто што и зборот ``<c>'' за зборот ``<d>''. |
| MK3 | Односот меѓу зборовите ``<a>'' и ``<b>'' е еднаков со односот меѓу зборовите ``<c>'' и ``<d>''. |
| PT | ``<a>'' está para ``<b>'' assim como ``<c>'' está para ``<d>''. |
| RO | ``<a>'' este pentru ``<b>'' cum ``<c>'' este pentru ``<d>''. |
| SK1 | Slovo ``<a>'' sa má k slovu ``<b>'' ako slovo ``<c>'' k slovu ``<d>''. |
| SK2 | Vzťah medzi slovami ``<a>'' a ``<b>'' je rovnaký ako medzi ``<c>'' a ``<d>''. |
| SK3 | ``<a>'' sa má k ``<b>'' ako ``<c>'' k ``<d>''. |
| SL1 | Beseda ``<a>'' je besedi ``<b>'' enako, kot je beseda ``<c>'' besedi ``<d>''. |
| SL2 | Beseda ``<a>'' je besedi ``<b>'' enako, kot je besedi ``<d>'' beseda ``<c>''. |
| SL3 | ``<a>'' in ``<b>'' sta kot ``<c>'' in ``<d>''. |
Detailed results on analogy completion with the above templates as well as translation prediction with XLM-R and mBERT are provided in the LREC submission.
The detailed results achieved on analogy completion with BLOOM are reported in the following table, where only an overview graphic is included in the LREC submission.
| L01 hypern - animals | L02 hypern - misc | L03 hyponyms - misc | L04 meronyms - substance | L05 meronyms - member | L06 meronyms - part | L07 synonyms - intensity | L08 synonyms - exact | L09 antonyms - gradable | L10 antonyms - binary | Total | |
| EN | 0.70 | 0.60 | 0.30 | 0.40 | 0.26 | 0.23 | 0.23 | 0.27 | 0.40 | 0.73 | 0.41 |
| AL | 0.33 | 0.50 | 0.37 | 0.27 | 0.07 | 0.10 | 0.07 | 0.10 | 0.10 | 0.07 | 0.20 |
| BM | 0.13 | 0.13 | 0.17 | 0.23 | 0.03 | 0.23 | 0.07 | 0.10 | 0.03 | 0.03 | 0.12 |
| DE | 0.47 | 0.57 | 0.13 | 0.30 | 0.10 | 0.33 | 0.03 | 0.13 | 0.07 | 0.27 | 0.24 |
| EL | 0.40 | 0.17 | 0.13 | 0.07 | 0.07 | 0.10 | 0.10 | 0.13 | 0.13 | 0.33 | 0.16 |
| ES | 0.90 | 0.53 | 0.27 | 0.33 | 0.20 | 0.20 | 0.10 | 0.13 | 0.30 | 0.47 | 0.34 |
| FR | 0.77 | 0.50 | 0.33 | 0.57 | 0.17 | 0.20 | 0.00 | 0.23 | 0.23 | 0.50 | 0.35 |
| HE | 0.17 | 0.17 | 0.10 | 0.10 | 0.03 | 0.10 | 0.10 | 0.07 | 0.13 | 0.10 | 0.11 |
| HR | 0.40 | 0.30 | 0.03 | 0.33 | 0.07 | 0.13 | 0.03 | 0.17 | 0.03 | 0.13 | 0.16 |
| IT | 0.60 | 0.67 | 0.17 | 0.23 | 0.07 | 0.13 | 0.13 | 0.13 | 0.17 | 0.47 | 0.28 |
| LT | 0.40 | 0.37 | 0.03 | 0.03 | 0.07 | 0.10 | 0.10 | 0.07 | 0.07 | 0.10 | 0.13 |
| MK | 0.37 | 0.47 | - | 0.10 | 0.03 | - | 0.10 | 0.07 | - | 0.23 | 0.20 |
| PT | 0.60 | 0.57 | 0.17 | 0.36 | 0.10 | 0.33 | 0.13 | 0.20 | 0.40 | 0.70 | 0.36 |
| SL | 0.36 | 0.33 | 0.07 | 0.30 | 0.10 | 0.17 | 0.10 | 0.07 | 0.10 | 0.13 | 0.17 |
| SK | 0.33 | 0.47 | 0.13 | 0.13 | 0.07 | 0.10 | 0.10 | 0.23 | 0.17 | 0.23 | 0.20 |
| RO | 0.27 | 0.57 | 0.13 | 0.13 | 0.03 | 0.07 | 0.17 | 0.13 | 0.10 | 0.10 | 0.17 |
| Total | 0.45 | 0.43 | 0.17 | 0.24 | 0.09 | 0.17 | 0.10 | 0.14 | 0.16 | 0.29 | 0.22 |