Airbnb Analysis 🏡🔍

🌟 Stakeholders

Our team, in alphabetical order:

Karan Khubdikar
Mo Norouzi
Nicole Bidwell

Welcome to the repository for the Airbnb Analysis.

⭐️ Project Overview

With over 8 million active listings across more than 100,000 cities and towns, Airbnb boasts an extensive network of accommodations, offering travelers a wide range of unique stays (Airbnb, 2024). In this project, our team uses machine learning algorithms to predict listing prices using various property details like geographical location, room type, and review activity. We implement rigorous methods to analyze the data and build machine learning models, including exploratory data analysis, feature engineering, cross-validation, hyperparameter optimization, and feature selection. We explore several models including Ridge, Random Forest Regression, XGBoost, and LGBM Regressor, and incorporate Recursive Feature Elimination with Cross-validation. Furthermore, we explore SHAP values which provide valuable insights into feature importance and model interpretability. Airbnb and hosts could use this project to guide future listing prices and understand the factors that drive prices.

📄 Report

The pdf copy of the final report can be viewed here.

📘 Data Source

The dataset used in this project is the New York City Airbnb Open Data, which is located on Kaggle.

⚙️ Usage

The project can be run locally using a virtual environment. All required dependencies are listed in the environment file. To set up the environment, run the pipeline, and build the report, follow the steps below.

Set up

Clone the repository.

git clone https://github.com/MoNorouzi23/Airbnb_analysis.git

Install the dependencies by running the following command from the root of the directory.

conda env create -f environment.yml

Activate the virtual environment.

conda activate airbnb_analysis

Run the relevent scrips using the Makefile.

make all

Note: this command (above) will run only the relevant scripts that are part of the main pipeline. This includes the scripts for generating EDA plots, performing feature engineering, preprocessing, data splitting, RFECV model training, evaluation, and creating SHAP plots. Additional scripts, or to run a script individually, can be done with the command: python -m src.<path>.<script name>. The outputs from all scripts (except for the processed data files and the random forest hyperparameter model) are already included in the repository.

To build the report, run the following command from the root of the directory.

jb build report

You can delete the output generated by make all by running make clean. The output from scripts not included in the pipeline will remain.

📖 License

All reports contained herein are licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) License. See the license file for more information.

The software code contained in this repository is licensed under the MIT license. See the license file for more information.

If you reuse or remix this content, please provide attribution and include a link to this webpage.

📚 References

Airbnb. (2024). About us. URL: https://news.airbnb.com/about-us/

Core Libraries

Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9(3), 90-95. URL: https://matplotlib.org/

Lundberg, S. M., & Lee, S. I. (2017). A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems (NIPS 2017). URL: https://shap.readthedocs.io/en/latest/

McKinney, W. (2010). Data Analysis with Python and Pandas. URL: https://pandas.pydata.org/

Pedregosa, F., et al. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830. URL: https://scikit-learn.org/

Van der Walt, S., Colbert, S. C., & Varoquaux, G. (2011). The NumPy Array: A Structure for Efficient Numerical Computation. Computing in Science & Engineering, 13(2), 22-30. URL: https://numpy.org/

Vega, J., & Altair Development Team. (2017). Altair: A Declarative Statistical Visualization Library for Python. URL: https://altair-viz.github.io/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Airbnb Analysis 🏡🔍

🌟 Stakeholders

⭐️ Project Overview

📄 Report

📘 Data Source

⚙️ Usage

Set up

📖 License

📚 References

Core Libraries

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
data		data
docs		docs
models		models
output		output
report		report
src		src
Airbnb_analysis_draft.ipynb		Airbnb_analysis_draft.ipynb
Airbnb_analysis_report_copy.pdf		Airbnb_analysis_report_copy.pdf
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml

License

nicolebid/Airbnb_analysis

Folders and files

Latest commit

History

Repository files navigation

Airbnb Analysis 🏡🔍

🌟 Stakeholders

⭐️ Project Overview

📄 Report

📘 Data Source

⚙️ Usage

Set up

📖 License

📚 References

Core Libraries

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages