Skip to content

nlpcuom/Word-Frequency-List-for-Sinhala

Repository files navigation

A Word Frequency List for Sinhala

The descriptions of the files available as follows. 

1.Word Frequency List - word_frequency_list_2M.si
  Total number of unique words : 2,138,021
  Total number of words : 122,998,105

2.Verified Word List  - verified_word_list_200K.si
  correct words without freq.si
  Total Unique words : 280,603

3.Verified Word List with Morphological Analysis - verified_word_list_morph_analysed.txt

4.Verified Word List with Lemma Analysis - verified_word_list_lemma_analysis.txt

Citation

@inproceedings{fernando-dias-2021-building,
    title = "Building a Linguistic Resource : A Word Frequency List for {S}inhala",
    author = "Fernando, Aloka  and
      Dias, Gihan",
    booktitle = "Proceedings of the 18th International Conference on Natural Language Processing (ICON)",
    month = dec,
    year = "2021",
    address = "National Institute of Technology Silchar, Silchar, India",
    publisher = "NLP Association of India (NLPAI)",
    url = "https://aclanthology.org/2021.icon-main.74",
    pages = "606--610"
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors