Medieval Latin Normalization Model based on Georges 1913

This repository contains the implementation of a PyTorch-based text2text model with attention for normalizing orthographic variations in medieval Latin texts.

The model is trained on the Normalized Georges 1913 Dataset and leverages Hugging Face's ecosystem for easy model and vocabulary management.

train_model.py: Script for training the normalization model.
- Includes dynamic loading of the dataset and vocabulary.
- Trains a Seq2Seq model with an attention mechanism and saves the model and vocabulary for later use.
test_model.py: Script for testing the normalization model.
- Loads the trained model, vocabulary, and configuration from a Hugging Face repository.
- Normalizes test words from an input file (test_normalisation.txt).

Usage

Train the Model:
- Modify train_model.py as needed for your dataset.
- Run:
```
python train_model.py
```
- Saves:
  - Model: normalization_model.pth
  - Vocabulary: vocab.pkl
  - Config: config.json
Test the Model:
- Uses https://huggingface.co/mschonhardt/georges-1913-normalization-model as default. If training your own model, ensure the model, vocabulary, and configuration are uploaded to a Hugging Face repository.
- Add words to test_normalisation.txt for testing.
- Run:
```
python test_model.py
```
- Outputs the normalized forms of the test words.

Acknowledgments

Dataset and model were created by Michael Schonhardt (https://orcid.org/0000-0002-2750-1900) for the project Burchards Dekret Digital.

Creation was made possible thanks to the lemmata from Georges 1913, kindly provided via www.zeno.org by 'Henricus - Edition Deutsche Klassik GmbH'. Please consider using and supporting this valuable service.

License

CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/legalcode.en)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
config.py		config.py
normalize_tei.py		normalize_tei.py
normalize_word.py		normalize_word.py
test_model.py		test_model.py
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Medieval Latin Normalization Model based on Georges 1913

Contents

Usage

Acknowledgments

License

About

Uh oh!

Releases

Packages

Languages

michaelscho/georges-1913

Folders and files

Latest commit

History

Repository files navigation

Medieval Latin Normalization Model based on Georges 1913

Contents

Usage

Acknowledgments

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages