Assessment of Pre-Trained Models Across Languages and Grammars

This repository contains the official implementation for the paper: Assessment of Pre-Trained Models Across Languages and Grammars Alberto Muñoz-Ortiz, David Vilares, and Carlos Gómez-Rodríguez Presented at IJCNLP-AACL 2023 in Nusa Dua, Bali, Indonesia.

Overview

This project evaluates the performance of various pre-trained language models (like BERT, XLM-R, and CANINE) across different languages and grammatical structures (Constituency and Dependency parsing).

Project Structure

The repository is organized as follows:

src/: Core logic and model implementations.
scripts/: Entry-point scripts for training, evaluation, and plotting.
notebooks/: Jupyter notebooks for data analysis and visualization.
data/: Directory for storing datasets and intermediate scores.
results/: Output logs and evaluation results.
config/: Model and training configurations.

Installation

Clone the repository:

git clone https://github.com/amunozo/multilingual-assessment.git
cd multilingual-assessment

Create a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Usage

Training

To train the models for dependency parsing:

python scripts/train.py

Evaluation

To evaluate the trained models:

python scripts/eval.py

Plotting Results

To generate plots from the evaluation scores:

python scripts/plot.py

Results

The main findings of the paper show how different subword tokenization strategies and model architectures impact the cross-lingual transferability of grammatical knowledge. For detailed results, please refer to our paper.

Citation

If you use this code or our findings in your research, please cite:

@inproceedings{munoz-ortiz-etal-2023-assessment,
    title = "Assessment of Pre-Trained Models Across Languages and Grammars",
    author = "Mu{\~n}oz-Ortiz, Alberto  and
      Vilares, David  and
      G{\'o}mez-Rodr{\'i}guez, Carlos",
    booktitle = "Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = nov,
    year = "2023",
    address = "Nusa Dua, Bali",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.ijcnlp-main.23",
    pages = "343--358",
}

Contact

For any questions or issues, please contact the main author: Alberto Muñoz-Ortiz - alberto.munoz.ortiz@udc.es

Acknowledgments

We acknowledge the European Research Council (ERC), which has funded this research under the Horizon Europe research and innovation programme (SALSA, grant agreement No 101100615), ERDF/MICINN-AEI (SCANNER-UDC, PID2020-113230RB-C21), Xunta de Galicia (ED431C 2020/11), grant FPI 2021 (PID2020-113230RB-C21) funded by MCIN/AEI/10.13039/501100011033, and Centro de Investigación de Galicia ‘‘CITIC’’, funded by the Xunta de Galicia through the collaboration agreement between the Consellería de Cultura, Educación, Formación Profesional e Universidades and the Galician universities for the reinforcement of the research centres of the Galician University System (CIGUS).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assessment of Pre-Trained Models Across Languages and Grammars

Overview

Project Structure

Installation

Usage

Training

Evaluation

Plotting Results

Results

Citation

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
config		config
data		data
external		external
notebooks		notebooks
results		results
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

License

amunozo/multilingual-assessment

Folders and files

Latest commit

History

Repository files navigation

Assessment of Pre-Trained Models Across Languages and Grammars

Overview

Project Structure

Installation

Usage

Training

Evaluation

Plotting Results

Results

Citation

Contact

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages