CACCHT: Creating Annotated Corpora of Classical Hebrew Texts

The CACCHT project is a collaboration of Martijn Naaijer (University of Zurich), Willem van Peursen (Vrije Universiteit Amsterdam), Oliver Glanz (Andrews University), Christian Canu Højgaard (Fjellhaug International University College), Martin Ehrensvärd (University of Copenhagen) and Robert Rezetko (University of Copenhagen).
Together with specialists in the field we develop linguistically annotated datsets of Semitic texts. These datasets are publicly available and can be used freely for research and education. Some datasets have only word-level annotations, while others also contain syntactic features.

Datasets

We are working on the following datasets:

Text-Fabric

All the datasets are Text-Fabric datasets and can be accessed and used with Python.

BHSA

There is an important role for the Biblia Hebraica Stuttgartensia Amstelodamensis (BHSA) in this project. The BHSA is the dataset of the Masoretic Text of the Hebrew Bible with linguistic annotations that is developed and maintained by the ETCBC. In general, CACCHT follows the annotation conventions of the BHSA and we adapt them for the specific characteristics of a language or text.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.gitignore		.gitignore
Blog_Notebook_POS.ipynb		Blog_Notebook_POS.ipynb
Blog_Notebook_POS_Nehemiah.ipynb		Blog_Notebook_POS_Nehemiah.ipynb
Clause_boundaries.ipynb		Clause_boundaries.ipynb
POS_tagger_for_Hebrew.ipynb		POS_tagger_for_Hebrew.ipynb
Phrase_boundaries-BidirectionalLSTM.ipynb		Phrase_boundaries-BidirectionalLSTM.ipynb
Phrase_boundaries.ipynb		Phrase_boundaries.ipynb
README.md		README.md
blog_etcbc_13_5_2019.ipynb		blog_etcbc_13_5_2019.ipynb
word_boundaries_add_evaluation.ipynb		word_boundaries_add_evaluation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CACCHT: Creating Annotated Corpora of Classical Hebrew Texts

Datasets

Text-Fabric

BHSA

About

Uh oh!

Releases

Packages

Languages

ETCBC/CACCHT

Folders and files

Latest commit

History

Repository files navigation

CACCHT: Creating Annotated Corpora of Classical Hebrew Texts

Datasets

Text-Fabric

BHSA

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages