Yelp-Recommender-System

Yelp Recommender System - Your Best Restaurant Finder

Dependencies

Requires following packages for json_to_csv_converter.py (using python2):

csv
simplejson

Requires following packages for other data processing codes:

numpy
pandas
surprise
sklearn
tqdm
sqlalchemy

Introduction

Due to the increasing number of users and the rapid expansion of information, the problem of sparsity of data matrix in traditional collaborative filtering (CF) becomes more and more prominent. Also, the traditional contentbased filtering (CB) has the problem of lacking novelty in recommendation results. Therefore, we propose a new hybrid recommendation method with the combination collaborative filtering and content-based filtering. We use CF to get rough recommendations, then use CB to get more accurate recommendation based on that. We implement our hybrid recommendation system on yelp restaurants recommendation task.

Dataset

We use public Yelp dataset (https://www.yelp.com/dataset). The Yelp dataset is a subset of our businesses, reviews, and user data for use in personal, educational, and academic purposes. We choose Reviews.csv, Business.csv, and Users.csv which are 8.7 GB in total. After preprocessing and only focus on restaurants, we still remain 3,800,000+ reviews, 55,000+ restaurants, 290,000+ users

Usage

1. json_to_csv_converter.py

Provided by Yelp, it is used for converting .json files to .csv files for subsequent processing. There was a little bug in it so I submit a correct version. Please note that this code should be runned in Python2.

2. business_preprocessing.py

Implement data clean on 'yelp_academic_dataset_business.csv'. It save the preprocessed data as 'business.csv'.

3. user_preprocessing.py

Implement data clean on 'yelp_academic_dataset_user.csv'. It save the preprocessed data as 'user.csv'.

4. review_preprocessing.py

Implement data clean on 'yelp_academic_dataset_review.csv'. It save the preprocessed data as 'review.csv'.

5. LDA.py

Conduncted LDA by PySpark, save the topics for each review as npy file.

6. topics.py

Use topics given by LDA algorithm for each topic, we can get topics for every users and business. The data will be stored as 'user_topics.csv' and 'business_topics.csv'.

7. categories.py

We can get categories for each business in 'business.csv', then we can aggregate them to get categories for users. The data will be stored as 'user_topics.csv' and 'business_topics.csv'.

8. sentiment.py

Use Vadar to get the sentiment of each review and save as 'sentiment.csv'.

9. final_preprocessing.py

Compute category similarity and topic similarity between all users and businesses using cosine similarity. The data will be stored as 'final_matrix.csv'.

10. train.py

Train the CF model and save it.

11. test.py

Test the pure CF method and our hybrid method, show metrics for both methods.

Website

1. Frontend

We built Frontend by Vue.

2. Backend

We used Django to build backend. You should put business_categories.csv, business_topics.csv, user_categories.csv, user_topics.csv, and model.pickle in this folder to run the backend. All these files can be got from the codes we provided.

3. Demo

We desgined four pages in total.
The first page is the main page, users can type their user id in the input box.

We store business and user data as tables in database, and we use MySQL to visit them. If the user has signed up for yelp, which means we can find his/her information in databse, we can show his/her user ID and username on the top left corner. Then, our hybrid algorithm can find ten restaurants as recommendations based user's history records.

If the user is a new user for Yelp, we built a page for him/her to choose some categories that the user is interested in.

And we implemented pure content-based filtering algorithm to do recommendations. That's how we use CB to deal with the problem of cold-start.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Data Processing		Data Processing
Demo img		Demo img
Front-End		Front-End
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Yelp-Recommender-System

Dependencies

Introduction

Dataset

Usage

1. json_to_csv_converter.py

2. business_preprocessing.py

3. user_preprocessing.py

4. review_preprocessing.py

5. LDA.py

6. topics.py

7. categories.py

8. sentiment.py

9. final_preprocessing.py

10. train.py

11. test.py

Website

1. Frontend

2. Backend

3. Demo

About

Uh oh!

Releases

Packages

Languages

cjdsj/Yelp-Recommender-System

Folders and files

Latest commit

History

Repository files navigation

Yelp-Recommender-System

Dependencies

Introduction

Dataset

Usage

1. json_to_csv_converter.py

2. business_preprocessing.py

3. user_preprocessing.py

4. review_preprocessing.py

5. LDA.py

6. topics.py

7. categories.py

8. sentiment.py

9. final_preprocessing.py

10. train.py

11. test.py

Website

1. Frontend

2. Backend

3. Demo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages