Worked on Twitter Hate Speech Data to predict whether a twwet contains Racist or Sexist remarks.
Machine Learning Methods Used
Used n-grams as features; Used TFIDF & Bag of words model to convert Tweet to vector of features.
Used SMOTE to reduce the data imbalance by creating synthetic Hate tweets.
Applied Random Forest, Naive Bayes and ANN for prediction.
Deep Learning Methods
Used an Embedding layer followed by an LSTM Architecture. Also used Bi-directional LSTM.
RESULT
Got Best accuracy of 95% and F1 score 70.7% using Bi-Directional LSTM.