Kaggle: Real or Not? NLP with Disaster Tweets
Public competition on Kaggle to predict witch Tweets are about real disasters and with one's aren't, using Machine Learning models.
As part of our personal development and continuing education, I joined this Kaggle competition with a group of friends to improve our knowledge and develop more experience in the NLP field.
We decided to join this competition as a team to enrich each other experience and obtain better results thru collaboration.
- Data exploration
- Data cleaning with Python, Pandas and Regex
- Checked the correct spelling and validation of words
- Tokenization
- Lemmatization
- Vectorization of the data and removal of stop words
- Exploration of different ML supervised/classification models with Sklearn
- Modified hyperparameters to implement a Grid Search and H2o to improve the accuracy of the models
- Preparation of the submission file
After implementing different ML models, we achieved an accuracy of 0.80232 with a Support Vector Machine (SVC) model. This result can be improved with other methodologies and libraries.
Explore and implement libraries like spacy and word embedding or methodologies like steaming. Also, we could drastically improve the accuracy using google libraries for NLP.
- Python
- Pandas
- Regex
- NLTK
- Sklearn
- H2o
| Esdras Campos | Saúl Romero | César Campuzano |
|---|---|---|
| https://github.com/EsdrasGrau | https://github.com/sromero9485 | https://github.com/cesarcamp |
