This project compares multiple types of statistical and machine learning models to predict taxi cancellations. The model is coded in R. The dataset (like many real-world datasets) was highly imbalanced, and much of the performance of all models hinged on adapting the models to this imbalance.
Models:
Logistic Regression (Penalized / Non-Penalized)
Decision Tree Model
XGBoost
Artifical Neural Network
Ensemble Model
The data was provided by the academic institution and professor to complete this project, and is restricted. As such, it is not provided in this repository.
Link to the PowerPoint Presentation (web-based):
https://docs.google.com/presentation/d/1rkx9IXgQH_A088ZDVaiMZRnapJuxT5_iFUghQDCgrp0/edit?usp=sharing