Whatsapp chats sentiment analysis:

🖖 Do you want to know who is the person who talks the most in your group? Who is the most negative or the most positive? In this project I will show you how to know it.

👣 First steps.

This is the Final project of Inronhack's Bootcamp.

🤔 Project:

In this project I am presenting an improvement proposal for WhatssApp application. With which I can know who is the conversation leader or the most negative, besides being able to make an implementation to help minors that can be suffering harassment in this application.

😌 What does the project do?

This implementation transforms the .txt file from Whatsapp to a .csv file. This dataframe looks like this:

Fecha	Time	Person	Mensaje
30/11/2023	19:12:50	Sil	Muchas gracias!
...	...	...	...

Then we extract the general information so we can have see more interesting data. This data looks like this:

Fecha	Day	Num_Day	Num_Month	Month	Year	...
30/11/2023	Jueves	30	11	Nov	2023	...
...	...	...	...	...	...	...

Fecha, Day, Num_Day, Num_Month, Month, Year, Time, Person, Mensaje, Letras, Palabras.

We implement a data cleaning on the "mensaje" column to make the model able to predict sentiments, in which we use:

Removing of accent marks.
Changing abbreviation (for example: "xq" by "porque")
Special characters (like numbers, exclamations, interrogations, @, #, etc.)
Text lemmatization.
Text stemming.
Removing stopwords.

After that data cleaning of the "mensaje" column, we analyse the sentiments of the sent messages. Sentiment Analysis is the process of ‘computationally’ determining whether a piece of writing is positive, negative or neutral. I use TextBlob, Vader, N-grams:

TextBlob is a Python library used for text data mining and naturan language processing.
Vader with SentimentIntensityAnalyzer. Is a rule-based sentiment analysis tool that is specifically designed for analyzing social media texts. Vader is a pre-trained sentiment analysis model that provides a sentiment score for a given text.
N-grams con nltk. A n-grama is a collection of n successive elements in a text document that can include words, numbers, symbols and punctiation. N-gramas models are useful in many text analysis applications where word sequences are relevant like sentiment analysis, text clasification and text generation.

We keep "Fecha", "Person", "Mensaje" and "clean_text" columns. After this, we split the text by days, hours and minutes to analyse its sentiment. With this information we can display our Dashboard.

With which we can know:

Message frequency.
User participation.
Most used words.
Sentiment analysis.
Topics analysis.
Social network analysis.
Changes over time analysis.

Dashboard link

[!NOTE]

Implent a pre education language model or use an Embedding for a better prediction of sentiments.
Return negative messages as an alert to tutors in cases of harassment in this platform.

🤩 Why is this project useful?

This project is useful to know how is a conversation evolving, who is the conversations leader, who is the most positive or the most negative person. And in case of cyberbullying, the tutors of these minors could put measures on the situation.

🤖 Additionally:

Used libraries:
- Pandas.
- csv.
- Re.
- matplotlib.
- nltk.
- unicodedata.
- Numpy.
- wordcloud.
- TextBlob.
- vaderSentiment.
- spacy.
🙈 Project structure:

Proyect_final/
├──  assets
│    ├── foto.png
│    └── whatsapp.png
│── data
│   ├── chat_sentimientos.csv
│   ├── chat_Vader.csv
│   ├── chat.csv
│   ├── chat.txt
│   ├── conversacion_3H_TextBlod.csv
│   ├── conversacion_D_TextBlod.csv
│   ├── conversacion_M_TextBlod.csv
│   ├── data_chat_clear_text.csv
│   ├── Limpieza_inicial_chat.csv
│   └── pruebas
│       ├── all_sentimientos.csv
│       ├── capi_chat.csv
│       ├── clearM.csv
│       ├── conversacion_1H_sentimiento.csv
│       ├── conversacion_5H_sentimiento.csv
│       ├── conversacion_D_sentimiento.csv
│       ├── conversacion_Dia_sentimiento.csv
│       ├── conversacion_M_sentimiento.csv
│       └── data_capi_chat_clear.csv
├── modules
│   ├── abreviaturas.py
│   ├── acentos.py
│   ├── carasteres_especiales.py
│   ├── conversaciones.py
│   ├── general.py
│   ├── lemmatizacion.py
│   ├── ngrams.py
│   ├── normalize.py
│   ├── Remove_stopwords.py
│   ├── Text_stemming.py
│   ├── textBlod.py
│   └── vader.py
├── notebooks
│   ├── Capi_chat_limpieza.ipynb
│   ├── chat_sentimientos.ipynb
│   ├── conversaciones_chat.ipynb
│   ├── EDA_capi.ipynb
│   └── Limpieza_inicial_chats.ipnb
├── LICENSE
├── main.py
└── README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whatsapp chats sentiment analysis:

👣 First steps.

🤔 Project:

😌 What does the project do?

🤩 Why is this project useful?

🤖 Additionally:

🙈 Project structure:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.vscode		.vscode
assets		assets
data		data
modules		modules
notebook		notebook
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Folders and files

Latest commit

History

Repository files navigation

Whatsapp chats sentiment analysis:

👣 First steps.

🤔 Project:

😌 What does the project do?

🤩 Why is this project useful?

🤖 Additionally:

🙈 Project structure:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages