This project aims to analyse the marks of the International Baccalaureate (IB) higher level biology students of Cameron Heights C.I. in order to predict their grades. Students in the IB curriculum receive a grade between 1 and 7 and teachers must predict the final grades of students before their exams. However, teachers are only able to predict the correct grades of students 40.86% of the time with a mean error squared (MSE) of 1.08 grade points.
I wish to improve the accuracy of these predicted grades with machine learning. Some classification algorithms that I plan to use include k-nearest neighbours, random forests, support vector machines, naive Bayes, and neural networks. It is my intention to code each algorithm twice: once without any machine learning libraries and then again using scikit-learn.
The accuracy of each model was determined using k-fold cross-validation with k=10 and averaging the statistics over 5 trials.
I wish to thank Mr. Busch and Ms. Drung at Cameron Heights C.I. for supporting me with this project and providing me with their detailed markbook data for hundreds of their past biology students. Additionally, I would like to thank Mr. Ramzan for retrieving the final biology grades of all the students as well. This project would not have been possible with your help.
Each student was identified only by their student number to preserve anonymity. I have intentionally not included the mark data due to privacy concerns.