Coffee ReBrewer: An Interactive Web App to Find the Best Coffee Using Aspect-Based Sentiment Analysis (ASBA)
Welcome to Coffee ReBrewer!
For a brief video introduction and demo, please checkout the link below: https://www.youtube.com/watch?v=DOXUgA9VJXA
Originally created for a Georgia Tech Masters course project, Coffee ReBrewer uses an NLP technique known as Aspect-Based Sentiment Analysis to discover how Yelp reviewers feel about a restaurant's coffee quality, independent of other factors in their review. The results are shown in an interactive web app, which includes a geographic map and time-series graph with hover-over data.
The project uses the following technologies:
- a pre-trained language model from pyASBA, which was executed on AWS Sagemaker to assign coffee sentiment scores
- a MySQL database hosted on AWS (EC2) for ease of data access and manipulation
- Python's Plotly + Dash web framework was used for the frontend
Organization: Most of the codes are put under the top directory which include the python class and functions for data clean, data filter, sentiment score analysis, data injestion, model evaluation and the web server. The Data folder is used to store the results from the data clean/filter/sentiment score and the original Yelp data set (not in the repo as it was 8.65 G). All of those data can be generated by our scripts from the original yelp dataset, however, it's cached here to speed up the web server's performance and furture evaluation experiment. Detail usage of the script can be found on Execution section.
This project includes 2 versions of our application;
- app_mySQL_version.py is the main version of our applicaiton. This version connects to our data on a mySQL database hosted on AWS through the mysqlproxy.py file.
- app_local_data.py connects directly to our pre-processed .csv files saved in the /Data folder. This version is a backup meant to maintain the app's functionality if we choose to stop maintaining the mySQL database in the future.
Both of versions can be installed as follows:
- Clone the git repository.
- Install the python requirements, either individually or by using the requirments file (
pip install requirements.txt)
To get the original yelp dataset
- Download the yelp data from https://www.yelp.com/dataset/download
- Unzip the files and put it under ./Data/yelp_dataset.
To run our application quickly (without reproducing our data processing steps), do the following;
- Open the command prompt or terminal at the top-level of the repository and run the following command:
python3 app_mySQL_version.py - Open your browser and navigate to the port where the app is running (provided by the terminal).
Reproducing Our Data Processing and Model Evaluation Steps
To run the Data Clean and Data Filter after unzipping the yelp data set:
- Data Clean:
python3 dataClean.pyNotes: This will generate the cleanup data into the same folder of the source file, not the ./Data folder - Data Filter:
python3 absa_filtered.pyNotes: This will generate the filtered data into a csv file as well as the sentiment data in the file
Data Ingestion:
- Currently we are using the mysql on AWS and it's hard coded.
- You can easily create an empty database and update the db server location in the mysqlproxy.py
- Run: python3 mysqlproxy.py will create the tables and ingest the data into the database.
Model Evaluation
- The model evaluation is done in modelTuner.py
- Run
python3 modelTuner.py, to get the accuracy result on the terminal