Disaster Response Pipeline Project

Introduction

In this Udacity project, a machine learning pipeline has been built to analyze real-time disaster messages from Appen (formerly Figure 8). The objective of the project is to classify these messages and direct them to the appropriate disaster relief agency.

The ETL pipeline, process_data.py, loads the messages and categories datasets, merges them, cleans the data and stores it in a SQLite database.

The ML pipeline, train_classifier.py, loads data from the SQLite database, splits dataset into training and test sets, builds a text processing and machine learning pipeline, trains and tunes a model using GridSearchCV, outputs results on the test set and exports the final model as a pickle file.

The pipeline is used for multi-output classification of text data. It takes in preprocessed text data and first pre-trains a Word2Vec model. Then, it uses a combination of CountVectorizer, TfidfTransformer, and Word2Vec transformers to transform the text data. Finally, it trains a multi-output classifier using a Random Forest algorithm. The pipeline is built using scikit-learn. The pipeline is returned as a scikit-learn pipeline object, which can be used to fit the model on training data and make predictions on new data.

The web app, run.py, runs in the terminal. When a user inputs a message into the app, the app returns classification results for all 36 categories.

Instructions:

Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves ¹ python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
Go to app directory: cd app
Run your web app: python run.py
Click the PREVIEW button to open the homepage

The grid search in the main function is disabled by commenting it out, as it requires a significant amount of time to execute. Currently, the pipeline is running with its default parameters. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
app		app
data		data
models		models
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disaster Response Pipeline Project

Introduction

Instructions:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Disaster Response Pipeline Project

Introduction

Instructions:

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages