This project uses logistic regression to predict whether an Airbnb host is a superhost using preprocessed listing data. The process includes model training, evaluation, feature selection, and persistence.
data_LR/airbnbData_train.csv: Preprocessed dataset with one-hot encoding, scaling, and imputed values.
- Logistic Regression (scikit-learn)
- GridSearchCV for hyperparameter tuning
- Confusion matrix, ROC curve, AUC, and precision-recall curve evaluation
- Feature selection using SelectKBest
- Model persistence using
pickle
- Classify hosts as superhosts (
TrueorFalse) - Evaluate default vs tuned logistic regression models
- Analyze feature importance and compare AUC scores
ModelSelectionForLogisticRegression.ipynb: The full notebookmodel_best.pkl: Pickled best modeldata_LR/airbnbData_train.csv: Cleaned Airbnb dataset
- Python 3
- pandas, numpy
- scikit-learn
- matplotlib, seaborn
Developed as part of a machine learning lab exercise!