-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
given an ISO language code (de, en, ru) find X* random articles on wikipedia in that language (how to deal with language specific urls for wikipedia:random?)
get those files into lang/dirty
strip the useful text of those files into lang/clean
learn from those files instead of dictionaries to get more realistic ngrams (dictionaries overrepresent patterns from rare words)
** likely dependant on how high the n in your ngram is.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels