As part of the udacity nano degree course, this project aims to categorize text using Natural language Processing (NLP). The intention and motivation of the project is to classify the text as soon as it comes in (realtime).
this exercise showcases data science skills within an end to end proess. from data cleaning to deployment. which can be summarized as below:
the project has 3 parts
- ETL(Extract - Transform - Load)
- ML (machine learning)
- Deployment
required to install dependencies with a minimum of python 3.6+ including the list of dependencies below:
- nltk
- sqlalchemy
- pickle
- flask
- plotly
- sklearn libraries (pipeline, randomforest classifier)
clone repo and unzip data.
run python ../app/run.py to start server
once server is running, go to url. 0.0.0.0:3001
all changes are logged within this repo.

