Due to the availability of CoNLL-U annotated corpora, the following languages have been modeled: Armenian, Basque, Brazilian Portuguese, Bulgarian, Catalan, Croatian, Czech, Galician, Georgian, German, Hungarian, Irish Gaelic, Kyrgyz, Naija, Polish, Scottish Gaelic, Sindhi, Slovakian, Slovenian, Spanish, Tamil, Turkish, Ukrainian, Uyghur, Welsh, Wolof
Due to the availability of CoNLL-U annotated NER corpora, the following languages will be modeled for NER: Brazilian Portuguese, Croatian, Slovak
Due to the availability of other NER corpora, the following languages are being formatted as CoNLL-U corpora to be modeled for NER: Catalan and Wolof
There is insufficient or no CoNLL-U annotated corpora for the following languages on our annotation roadmap: Assamese, Aymara, Awadi, Bambara, Balinese, Bemba, Bengali, Bhojpuri, Bororo, Burmese, Cebuano, (Chichewa / Chewa / Chinyanja / Nyanja), (Chitumbuka / Tumbuka /Rumphi), Dioula, Dogri, Dzongkha, Ewe, Fulani, Ga, Gujarati, Guaraní, Hassaniya, Ibibio, Igbo, (Ijaw, Izon), isiXhosa, isiNdebele, Javanese, Kabyle, Kangri, Kannada, Khoekhoe, (Kiswahili / Swahili), Krio, Kurdish, Lao, Lingala, Luganda, Malay (broken into Brunei, Malaysia Indonesian and other dialects), Malayalam, Marathi, Mongolian, Nepali, Odia/Oriya, Punjabi (Gurmukhi & Shahmukhi), Quechua, Sepedi, Sinhala, Tagalog, Tajik, Thai, Twi, Urdu, Urhobo, Uzbek, (Xitsonga / Tsonga), Yoruba - list being updated regularly
Follow this link for public models - https://github.com/bezokurepo/models
Work in progress annotated files can be found here - https://github.com/bezokurepo/data
You can find the full source for Universal Dependencies conllu files here - https://github.com/UniversalDependencies