Created by Corinne McCumber. Working title of the project: “Spam or Ham?”: Using Supervised Machine Learning for SMS Spam Classification. Final project for IS 597 MLC: Machine Learning Pipelines Using Cloud-Based Platforms, Summer 2024.
This repository contains:
- final_project_modules.py, a Python file of functions used during the project
- super_sms_dataset.csv, the dataset used in this project, sourced from Salman et al. via https://github.com/smspamresearch/spstudy
- IS597MLC_Final_Project_McCumber_Corinne.ipynb, a Jupyter notebook that contains all data processing for the project.
The notebook should be run start to finish, with special attention paid to the notebook instance (ml.r5.xlarge, with the volume size set to 100GB EBS) on Amazon Web Services and timestamp estimates for model training, to avoid errors due to memory space overload or session timeout.