Zillow Regression Project

Project Description:

Within this project, I will clean a large dataset containing sales information on single value homes within 3 counties in California. I will explore the data through visualizations and statistical tests to identify significant drivers of our target value; home value. Then I will build a regression machine learning model to predict those home values based on those features.

Project Goal:

Identify drivers of single family homes.
Use drivers to develop machine learning models to predict home value.

Initial Thoughts:

My initial hypothesis is that county, bedrooms, bathrooms, & square feet, are significant drivers of home value.

The Plan

Aquire data from codeup database
Prepare data
Discover potential drivers of home value through exploration
- Answer these questions:
  - Does location significanty influence home value?
  - Does bedroom count significanty influence value?
  - Does bathroom count significanty influence value?
  - Does square footage significanty influence value?
Develop a model to predict tax value
- Use drivers identified through exploration to build predictive models of different types
- Evaluate models on train and validate data
- Select the best model based on lowest RMSE in combination with R2 scores
- Evaluate the best model on test data
Draw conclusions

Data Dictionary

Feature	Definition
home_Value	The tax assesor appraised home value (Target)
bedrooms	Number of bedrooms for the home
bathrooms	Number of bathrooms for the home
sqft	Total square footage listed for a property
county	The county the home is located within
year_built	Year the home was built

Steps to Reproduce

Clone this repository
Acquire the data from the Codeup Database ('Zillow')
Put the data in the file containing the cloned repo
Create or copy your env.py file to this repo, specifying the codeup hostname, username and password
Run notebook

Takeaways and Conclusions

All models performed better than the baseline
The RMSE for OLS was the lowest at 223504
Because of this RMSE score and the low RMSE delta between train and validate set, I will proceed with this model on my test set.
Our test data results were similar to train and validate
The OLS model has improved the accuracy of the predictions by reducing the error from the baseline by approximately 17%.

Recommendations

If deeper location data were available like zip code for example, we could have much stronger of a location driver than just county

Next Steps

With more time, I would like to do more feature engineering, run addiitonal regression models, and explore the original database futher to see what features can be added

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
Project_scratch_work.ipynb		Project_scratch_work.ipynb
README.md		README.md
explore.py		explore.py
final_report.ipynb		final_report.ipynb
model.py		model.py
wrangle.py		wrangle.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zillow Regression Project

Project Description:

Project Goal:

Initial Thoughts:

The Plan

Data Dictionary

Steps to Reproduce

Takeaways and Conclusions

Recommendations

Next Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Zillow Regression Project

Project Description:

Project Goal:

Initial Thoughts:

The Plan

Data Dictionary

Steps to Reproduce

Takeaways and Conclusions

Recommendations

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages