In this notebook, I go through the Machine Learning process step-by-step. I begin by setting out a business problem and exploring the data. I then prepare, feature engineer, and scale the dataset. Thereafter, I train two models: A Logistic regression and a K-Nearest Neighbor. Finally, I optimize the K-Nearest Neighbor model by iterating through over 200 combinations of features in order to maximize its performance, which I measure primarily as its Recall Score.
The result is a reproducible Machine Learning pipeline that can accurately predict whether a website is a malicious or benign using raw data.