language-list

foundation model availability for downstream applications

Due to the availability of CoNLL-U annotated corpora, the following languages have been modeled: Armenian, Basque, Brazilian Portuguese, Bulgarian, Catalan, Croatian, Czech, Galician, Georgian, German, Hungarian, Irish Gaelic, Kyrgyz, Naija, Polish, Scottish Gaelic, Sindhi, Slovakian, Slovenian, Spanish, Tamil, Turkish, Ukrainian, Uyghur, Welsh, Wolof

NER supported languages - derived from Universal NER

Due to the availability of CoNLL-U annotated NER corpora, the following languages will be modeled for NER: Brazilian Portuguese, Croatian, Slovak

NER work in progress - derived from other sources

Due to the availability of other NER corpora, the following languages are being formatted as CoNLL-U corpora to be modeled for NER: Catalan and Wolof

roadmap of languages being annotated or augmented from Universal Dependencies

There is insufficient or no CoNLL-U annotated corpora for the following languages on our annotation roadmap: Assamese, Aymara, Awadi, Bambara, Balinese, Bemba, Bengali, Bhojpuri, Bororo, Burmese, Cebuano, (Chichewa / Chewa / Chinyanja / Nyanja), (Chitumbuka / Tumbuka /Rumphi), Dioula, Dogri, Dzongkha, Ewe, Fulani, Ga, Gujarati, Guaraní, Hassaniya, Ibibio, Igbo, (Ijaw, Izon), isiXhosa, isiNdebele, Javanese, Kabyle, Kangri, Kannada, Khoekhoe, (Kiswahili / Swahili), Krio, Kurdish, Lao, Lingala, Luganda, Malay (broken into Brunei, Malaysia Indonesian and other dialects), Malayalam, Marathi, Mongolian, Nepali, Odia/Oriya, Punjabi (Gurmukhi & Shahmukhi), Quechua, Sepedi, Sinhala, Tagalog, Tajik, Thai, Twi, Urdu, Urhobo, Uzbek, (Xitsonga / Tsonga), Yoruba - list being updated regularly

model repo

Follow this link for public models - https://github.com/bezokurepo/models

bezoku annotated files

Work in progress annotated files can be found here - https://github.com/bezokurepo/data

Universal Dependencies

You can find the full source for Universal Dependencies conllu files here - https://github.com/UniversalDependencies

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Distribution of 55 Languages in the Bezoku roadmap, weighted by speaker population (2.85 BN in total)(1).png		Distribution of 55 Languages in the Bezoku roadmap, weighted by speaker population (2.85 BN in total)(1).png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

language-list

foundation model availability for downstream applications

NER supported languages - derived from Universal NER

NER work in progress - derived from other sources

roadmap of languages being annotated or augmented from Universal Dependencies

model repo

bezoku annotated files

Universal Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

language-list

foundation model availability for downstream applications

NER supported languages - derived from Universal NER

NER work in progress - derived from other sources

roadmap of languages being annotated or augmented from Universal Dependencies

model repo

bezoku annotated files

Universal Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages