- Project Proposal
- Data Source
- Technologies
- Machine Learning Steps
- Using Machine Learning Model
- Slidedeck
- Team
Using supervised machine learning, create an algorithm that would take financial headlines as an input and analyze current sentiment in the financial industry. The analysis would be particularly interesting in the context of the Covid pandemic. One would think that a majority of the financial headlines would denote some kind of negative sentiment. That would be our initial assumption. Here is a link to our hosting Website.
- Using financial headline inputs in csv format, users can get an sentiment analysis with classification of each inputted financial headline.
- Using additional Stock News API data (date and tickers), stock list csv file, and our Tableau dashboard template, users can perform additional anaylsis on the following:
- Sentiment analysis a given time period
- Sentiment analysis by companies, sectors and industry
- Emotional Analysis by keywords.
- Keywords grouped in Tableau based on their feelings to create visualizations. Feelings Resource Website
- Keywords grouped in Tableau based on their feelings to create visualizations. Feelings Resource Website
- Word Blast
- Sentiment analysis a given time period
- Python Pandas
- PySpark
- Tableau
- Google Colab
- ScikitLearning
- JavaScript
- HTML/CSS
- w3schools.com
- cloudtables.com
-
Get Test Data to retrieve financial headlines and sentiment classification and save as csv
Get API key -
Read Test Data in as DataFrame
-
Tokenization - Break financial headlines into a list of words
-
Countvectorizer - Generate the term frequency vectors
-
Inverse Document Frequency - Down-weighs features which appear frequently in a corpus.
-
Split Data - Train 80%, Test 20%
-
Hypertuning and Model - Run Machine Notebook. The notebook runs the test data through both the Logistics Regression and Naive Bayes, applies hypertuning with Param Grid Search and Cross Validator and allows users to evaluate the models using area under ROC, accuracy, F1 score.
- Get financial headlines from Stock New API or use alternative dataset
- Read dataset in as DataFrame
- Run the model from above to retrieve sentiment classification for each headline
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
import numpy as np
lr = LogisticRegression(maxIter = 10)
paramGrid_lr = ParamGridBuilder() \
.addGrid(lr.regParam, np.linspace(0.3, 0.01, 10)) \
.addGrid(lr.elasticNetParam, np.linspace(0.3, 0.8, 6)) \
.build()
crossval_lr = CrossValidator(estimator=lr,
estimatorParamMaps=paramGrid_lr,
evaluator=MulticlassClassificationEvaluator(),
numFolds= 5)
cvModel_lr = crossval_lr.fit(trainDF)
best_model_lr = cvModel_lr.bestModel.summary
best_model_lr.predictions.columns- Get date and ticker information from Stock News API or use alternative dataset
- Use Stock List from NASDAQ with key ticker information
- Run the functions notebook to clean and merge the financial sentiment classification, date, and detailed ticker information to perform analysis
- Save merged DataFrame as csv
- Open csv in Tableau, using template provided, view data visualization.
| Team Member | Github username |
|---|---|
| Adriana Icasiano | adriana-icasiano |
| Paul Feliciano | pfeliciano1 |
| Alberto Gonzalez | dalismo |
| Abayomi Olujobi | bay0624 |
| Lovensky Lubin | Lubinl |
