Hidden Markov Models and the Viterbi Algorithm
In Problem Set 5, we build a machine learning-based bot named Sudi, who labels each word in a sentence with its part-of-speech, based on its neighbors and the word itself. The program reads large training sets of sentences and tags, then uses a map of maps to build an extensive hidden markov model. Using the Viterbi algorithm, which calculates scores based on obeservation and transition probability to decipher the most likely traversal of the automaton, the program outputs its predicted part-of-speech for every word based on the calculated Viterbi scores. The program can tag an entire testing file of sentences using Viterbi, and it can calculate its accuracy based on a corresponding file of correct tags. This model was able to tag parts-of-speech to a 97% accuracy.
