This project performs sentiment analysis on Twitter data using a variety of machine learning and deep learning models, including traditional ML classifiers, LSTM, CNN, RNN, ANN, and transformer-based models (BERT, RoBERTa). The workflow covers data loading, cleaning, visualization, feature engineering, model training, evaluation, and comparison.
- Data cleaning and preprocessing
- Exploratory data analysis and visualization (bar plots, histograms, word clouds)
- Feature engineering (TF-IDF, Word2Vec)
- Model training and evaluation:
- Traditional ML: Logistic Regression, Random Forest, SVM, Naive Bayes, Decision Tree, KNN
- Deep Learning: LSTM, CNN, RNN, ANN (using TensorFlow/Keras)
- Transformer-based: BERT, RoBERTa (using HuggingFace Transformers)
- Hyperparameter tuning (GridSearchCV)
- Model comparison (accuracy, F1 score)
- The notebook expects a CSV file containing Twitter data with columns such as
Tweet,clean_tweet,sentiment, etc. - The dataset is loaded interactively (e.g., via Google Colab's file upload) or by specifying a path.
- Python 3.7+
- Jupyter Notebook or Google Colab
- Key libraries:
- numpy, pandas, matplotlib, seaborn
- scikit-learn
- wordcloud, textblob
- gensim
- tensorflow, keras
- torch, transformers (HuggingFace)
Install dependencies with:
pip install numpy pandas matplotlib seaborn scikit-learn wordcloud textblob gensim tensorflow torch transformers- Open the notebook:
sentiment_analysis.ipynbin Jupyter or Google Colab. - Upload your dataset when prompted, or modify the code to load your CSV file directly.
- Run all cells sequentially to:
- Clean and explore the data
- Visualize sentiment distributions and word clouds
- Train and evaluate multiple models
- Compare model performance
- Traditional ML: Logistic Regression, Random Forest, SVM, Naive Bayes, Decision Tree, KNN
- Deep Learning: LSTM, CNN, RNN, ANN (TensorFlow/Keras)
- NLP Transformers: BERT, RoBERTa (HuggingFace Transformers)
- TextBlob: Rule-based sentiment analysis
- Word2Vec: Feature engineering for ML models
- The notebook provides accuracy and F1 score comparisons for all models.
- Visualizations and tables summarize model performance.
- For BERT/RoBERTa, GPU acceleration is recommended (e.g., Google Colab).
- You may need to authenticate with HuggingFace for model downloads.
- Modify hyperparameters and model architectures as needed for your experiments.
This project is for educational and research purposes.