This repository contains a Jupyter notebook for performing sentiment analysis on Twitter data using state-of-the-art natural language processing models, including BERT, LSTM, and GPT. The notebook covers the entire pipeline, from data preprocessing and exploration to model training, evaluation, and visualization of results.
-
Data Preprocessing: The notebook preprocesses Twitter data by cleaning and transforming text, removing noise, and preparing it for analysis.
-
Exploratory Data Analysis (EDA): Visualizations are provided to understand the distribution of sentiments in the dataset.
-
Modeling: Three powerful models—BERT, LSTM, and GPT—are implemented for sentiment classification. The notebook utilizes the Hugging Face Transformers library for BERT and GPT.
-
Training and Evaluation: The models are trained on the dataset, and their performance is evaluated using metrics such as accuracy, precision, recall, and F1-score.
-
Confusion Matrix: Visual representations of confusion matrices are included for a detailed understanding of model performance.
- Python
- Jupyter Notebook
- TensorFlow
- Hugging Face Transformers
- Plotly
- NLTK
- Pandas
- NumPy
This notebook explores sentiment analysis on Twitter using the LSTM (Long Short-Term Memory) architecture. The goal is to classify tweets into positive or negative sentiment categories.
The dataset used in this project is sourced from Kaggle: training.1600000.processed.noemoticon.csv
The project involves the following key steps:
-
Data Preprocessing: Cleaning and preparing the dataset for sentiment analysis.
-
Text Processing: Tokenization, sequence padding, and other text processing steps.
-
Model Architecture: Implementing LSTM (Long Short-Term Memory) model for sentiment analysis.
-
Training and Evaluation: Training the model and evaluating its performance on a test set.
To run the notebook:
- Click on the "Open in Colab" badge above or use the following link: Open In Colab.
- Follow the instructions in the notebook cells to execute each step.
The trained model achieved the following performance on the test set:
precision recall f1-score support
0 0.74 0.67 0.70 1012
1 0.69 0.77 0.73 988
accuracy 0.72 2000
This notebook explores sentiment analysis on Twitter using the BERT (Bidirectional Encoder Representations from Transformers) architecture. The goal is to classify tweets into positive or negative sentiment categories.
The dataset used in this project is sourced from Kaggle: training.1600000.processed.noemoticon.csv
The project involves the following key steps:
-
Data Preprocessing: Cleaning and preparing the dataset for sentiment analysis.
-
Text Processing: Tokenization and sequence padding of text data.
-
BERT Model: Utilizing the powerful BERT model for sentiment classification.
-
Training and Evaluation: Training the model and evaluating its performance on a test set.
To run the notebook:
- Click on the "Open in Colab" badge above or use the following link: Open In Colab.
- Follow the instructions in the notebook cells to execute each step.
The trained BERT model achieved the following performance on the test set:
precision recall f1-score support
0 0.75 0.79 0.77 1012
1 0.77 0.73 0.75 988
accuracy 0.76 2000
Visual representation of model predictions versus true labels:

Feel free to explore the notebook, experiment with different configurations, and contribute to the project!