Language Classifier

DSC 140A: Probabilistic Modeling and Machine Learning at UC San Diego

A simple Least Squares Classifier. Predicts whether a given word is Spanish or French based on a few bi-gram features.

I wrote a function that generates every two letter sequence in the alphabet to use it as a feature; I also manually added some common French and Spanish sequences and preffixes. This model achieved an accuracy of at least 75% on the training data. This model performed well on the unseen data and achieved an accuracy of %84.12 on the leaderboard.

Acknowledgements: Professor Justin Eldridge, UC San Diego.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language Classifier

DSC 140A: Probabilistic Modeling and Machine Learning at UC San Diego

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Language Classifier

DSC 140A: Probabilistic Modeling and Machine Learning at UC San Diego