Skip to content

A language classifier to classify if the given text is in English or Dutch using decision tree and adaboost

Notifications You must be signed in to change notification settings

Samridhi16/wikipedia-language-classifier

Repository files navigation

Wikipedia-Language-Classifier

  • A language classifier to classify if the given text is in English or Dutch using decision tree and adaboost over the text from Wikipedia.
  • Features used to predict the language are deteremined by the pronouns, parts of speech, frequency of i's,j's and k's(higher in Dutch as compared to English language),frequency of consecutive repeating letters in a word and average length of a word in a given sentence.

About

A language classifier to classify if the given text is in English or Dutch using decision tree and adaboost

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages