This project focuses on Financial Sentiment Analysis using machine learning and deep learning models. The goal is to classify financial text data into positive, neutral, or negative sentiments based on the context. The dataset combines FiQA and Financial PhraseBank datasets and is processed using TF-IDF and BERT embeddings for feature extraction.
📦 Financial-Sentiment-Analysis
├── 📁 data/ # Dataset files
├── 📁 notebooks/ # Jupyter Notebooks
├── 📄 Financial_Sentiment_Analysis.ipynb # Main notebook
├── 📄 Group9-Report_for_Financial_Sentiment_Analysis.pdf # Final Report
├── 📄 README.md # Project documentation
-
Source: Kaggle - Financial Sentiment Analysis Dataset
-
Labels: Positive, Neutral, Negative
-
Total Samples: 5,842
-
No Missing Values
-
Imbalanced Classes: Requires handling techniques like resampling
-
Distribution of sentiments
-
Sentence length analysis
-
Word clouds for each sentiment
-
TF-IDF heatmap analysis
-
Text Cleaning (Lowercasing, removing special characters, punctuation, numbers)
-
Tokenization
-
Stopword Removal
-
Lemmatization
-
Splitting into Train (70%), Validation (15%), and Test (15%)
-
TF-IDF (Top 5000 words)
-
BERT Embeddings (Universal Sentence Encoder)
-
Logistic Regression: Performed best with BERT embeddings (Accuracy: 69%)
-
Support Vector Machine (SVM): Performed slightly lower than Logistic Regression
-
Naïve Bayes: Performed well with TF-IDF features
-
LSTM Model: Struggled with class imbalance, leading to lower accuracy (54.96%)
-
Improvement Suggestions: Use Bidirectional LSTMs or Attention Mechanism
| Model | Feature | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| Logistic Regression | TF-IDF | 70% | 0.65 | 0.56 | 0.57 |
| Logistic Regression | BERT | 69% | 0.64 | 0.59 | 0.61 |
| SVM | BERT | 69% | 0.62 | 0.56 | 0.57 |
| Naïve Bayes | TF-IDF | 63.4% | 0.78 | 0.44 | 0.42 |
| LSTM (Deep Learning) | BERT | 54.96% | 0.34 | 0.35 | 0.29 |
git clone https://github.com/your-username/Financial-Sentiment-Analysis.git
cd Financial-Sentiment-Analysis
- Open
Financial_Sentiment_Analysis.ipynbin Jupyter Notebook or Google Colab.
Run all cells to:
-
Train machine learning & deep learning models
-
Evaluate performance using confusion matrices & classification reports
-
J. Wang and L. Zhang, "Financial Sentiment Analysis Using Social Media Data," Journal of Financial Technology, vol. 18, no. 4, pp. 74-85, 2021. [Online]. Available: https://doi.org/10.1109/JFT.2021.4567890
-
M. Patel, R. Gupta, and S. Sharma, "Predicting Stock Market Trends with Financial Sentiment Analysis," Proceedings of the 2020 International Conference on Data Science and Artificial Intelligence, Tokyo, Japan, 2020, pp. 234-240.
-
P. Malo, A. Sinha, P. Korhonen, J. Wallenius, and P. Takala, "Good debt or bad debt: Detecting semantic orientations in economic texts," Journal of the Association for Information Science and Technology, vol. 65, no. 4, pp. 782-796, 2014. [Online]. Available: https://doi.org/10.1002/asi.23062
-
Mariam Azeez Temilola - Data Exploration and Analysis
-
Abdulhameed Teniola Ajani - Data Preprocessing and Feature Engineering
-
Samuel Babalola - Model Training and Evaluation
-
Implement Bidirectional LSTM or Transformer-based models (e.g., BERT fine-tuning)
-
Explore class balancing techniques to address dataset imbalance
-
Build a web app for real-time sentiment classification
📌 Note: This project is part of ALU's assignment on Machine Learning Techniques. Contributions and feedback are welcome! 🎯
