Skip to content

css459/restaurant-closure-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Restaurant Closure Prediction Engine (2019)

Final Project for Big Data Science, Spring 2019

Cole Smith

Undergraduate

Running

Python Version

This project was written in Python 3.7. It is recommended to set up the virtual environment with that version. If your system defaults to Python 2, an interpreter can be specified with the --python flag to virtualenv

Set up virtualenv

To set up the environment run:

virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

Running Clustering

The clustering output can be viewed in doc/. It can also be generated by commenting out the code labelled as such in main.py

Running Predictions

The predictions can be ran directly by executing: python main.py

For clarity, the prediction output for the Regression is the total amount of restaurant closures (hard and soft, see below) for a given month, given a number of factors. Each row is a zip at a month in time.

The output for the Classification is of soft (see below) closures. This is done using the Restaurant Inspections Data Set. Each row is a restaurant in current-day.

Hard Closures vs Soft Closures

Since different datasets cannot reliably be joined, the closure information is broken out into Hard and Soft closures.

Hard Closures

Hard Closures are those in which a restaurant did not renew its DCA license and thus cannot legally operate in New York City. These are assumed to not be re-opened, since this closure was presumably voluntary.

Soft Closures

Soft Closures are those in which the health inspection results warrants a complete closure. This offer a richer set of supporting features since they originate from the Restaurant Inspection Dataset. However, there are generally far fewer soft closures than hard closures.

These closures are assumed to be involuntary, and restaurants may re-open upon a second inspection.

Merging Soft and Hard Closures

Since there is no given unique, universal identifier for a restaurant in these data sets, the only information that can be used to merge tables is the zip code and the date (Month and Year).

However, since it is assumed that Soft and Hard Closures are drawn from the same distribution (All restaurants must be inspected and must hold a DCA license), the master data set also includes information from the Restaurant Inspection Data Set aggregated to a monthly time-scale.

The total closures are therefore the sum between soft and hard closures for a given month and zip code.

About

Prediction Engine for Restaurant Closures in NYC

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published