Skip to content

Latest commit

 

History

History
9 lines (5 loc) · 630 Bytes

File metadata and controls

9 lines (5 loc) · 630 Bytes

Language Classifier

DSC 140A: Probabilistic Modeling and Machine Learning at UC San Diego

A simple Least Squares Classifier. Predicts whether a given word is Spanish or French based on a few bi-gram features.

I wrote a function that generates every two letter sequence in the alphabet to use it as a feature; I also manually added some common French and Spanish sequences and preffixes. This model achieved an accuracy of at least 75% on the training data. This model performed well on the unseen data and achieved an accuracy of %84.12 on the leaderboard.

Acknowledgements: Professor Justin Eldridge, UC San Diego.