Cross-lingual Inflection as a Data Augmentation Method for Parsing

This repository contains the official implementation for the paper: Cross-lingual Inflection as a Data Augmentation Method for Parsing Alberto Muñoz-Ortiz, Carlos Gómez-Rodríguez, and David Vilares Presented at the Third Workshop on Insights from Negative Results in NLP (ACL 2022).

Overview

This project proposes a morphology-based data augmentation method for low-resource dependency parsing. The core idea is to train a morphological inflector on a target low-resource language and apply it to a related rich-resource treebank. This creates "cross-lingual inflected" (x-inflected) treebanks that mimic the target language's morphology while retaining the rich-resource syntax.

Project Structure

The repository is organized as follows:

src/: Core logic for data conversion and morphological translation.
- main.py: Main functions for UD-UM conversion and inflection.
- const.py: Centralized configuration and paths.
scripts/: Entry-point scripts.
- train.py: Script to train a morphological inflector.
- x-inflect.py: Script to generate x-inflected treebanks.
external/: External tools and submodules.
- neural-transducer: Sequence-to-sequence model for morphological inflection.
- ud-compatibility: Scripts for UD to UniMorph conversion.
notebooks/: For analysis and visualization.
results/: Directory for output treebanks and logs.

Installation

Clone the repository and submodules:

git clone --recursive https://github.com/amunozo/x-inflection.git
cd x-inflection

Install dependencies: Ensure you have the required dependencies for neural-transducer and ud-compatibility.

Usage

Training a Morphological Inflector

python scripts/train.py --lang [LANG_CODE] --dir [DATA_DIR]

Generating X-Inflected Treebanks

python scripts/x-inflect.py --treebank_folder [UD_FOLDER] --model_folder [MODEL_DIR] --output [OUTPUT_DIR]

Citation

If you use this code or our findings in your research, please cite:

@inproceedings{munoz-ortiz-etal-2022-cross,
    title = "Cross-lingual Inflection as a Data Augmentation Method for Parsing",
    author = "Mu{\~n}oz-Ortiz, Alberto and
      G{\\'o}mez-Rodr{\\'i}guez, Carlos and
      Vilares, David",
    editor = "Sedoc, Jo{\~a}o and
      Balasubramanian, Niranjan and
      Goldwasser, Dan and
      Riedel, Sebastian",
    booktitle = "Proceedings of the Third Workshop on Insights from Negative Results in NLP",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.insights-1.7",
    doi = "10.18653/v1/2022.insights-1.7",
    pages = "51--57",
}

Contact

For any questions or issues, please contact the main author: Alberto Muñoz-Ortiz - alberto.munoz.ortiz@udc.es

Acknowledgments

This work is supported by a 2020 Leonardo Grant for Researchers and Cultural Creators from the FBBVA,3 as well as by the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, grant agreement No 714150). The work is also supported by ERDF/MICINN-AEI (SCANNER-UDC, PID2020-113230RB-C21), by Xunta de Galicia (ED431C 2020/11), and by Centro de Investigación de Galicia “CITIC” which is funded by Xunta de Galicia, Spain and the European Union (ERDF - Galicia 2014–2020 Program), by grant ED431G 2019/01.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
scripts		scripts
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-lingual Inflection as a Data Augmentation Method for Parsing

Overview

Project Structure

Installation

Usage

Training a Morphological Inflector

Generating X-Inflected Treebanks

Citation

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

amunozo/x-inflection

Folders and files

Latest commit

History

Repository files navigation

Cross-lingual Inflection as a Data Augmentation Method for Parsing

Overview

Project Structure

Installation

Usage

Training a Morphological Inflector

Generating X-Inflected Treebanks

Citation

Contact

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages