Skip to content

This project entailed using supervised machine learning on stock news to provide the sentiment on API data.

Notifications You must be signed in to change notification settings

dalismo/Finance_News_Sentiment

Repository files navigation

Finance News Sentiment

image

Table of Contents

Project Proposal

A brief description of our project

Using supervised machine learning, create an algorithm that would take financial headlines as an input and analyze current sentiment in the financial industry. The analysis would be particularly interesting in the context of the Covid pandemic. One would think that a majority of the financial headlines would denote some kind of negative sentiment. That would be our initial assumption. Here is a link to our hosting Website.

Why our final sentiment analysis will be useful to users?

  • Using financial headline inputs in csv format, users can get an sentiment analysis with classification of each inputted financial headline.
  • Using additional Stock News API data (date and tickers), stock list csv file, and our Tableau dashboard template, users can perform additional anaylsis on the following:
    • Sentiment analysis a given time period
    • Sentiment analysis by companies, sectors and industry
    • Emotional Analysis by keywords.
    • Word Blast

Data sources

  1. Stock News API
  2. NASDAQ List - Ticker, Country, Exchange, Financai Sector and Industry details

Technologies

Machine Learning Steps

  1. Get Test Data to retrieve financial headlines and sentiment classification and save as csv
    Get API key

  2. Read Test Data in as DataFrame

  3. Tokenization - Break financial headlines into a list of words

  4. Countvectorizer - Generate the term frequency vectors

  5. Inverse Document Frequency - Down-weighs features which appear frequently in a corpus.

  6. Split Data - Train 80%, Test 20%

  7. Hypertuning and Model - Run Machine Notebook. The notebook runs the test data through both the Logistics Regression and Naive Bayes, applies hypertuning with Param Grid Search and Cross Validator and allows users to evaluate the models using area under ROC, accuracy, F1 score.

Using Machine Learning Model

  1. Get financial headlines from Stock New API or use alternative dataset
  2. Read dataset in as DataFrame
  3. Run the model from above to retrieve sentiment classification for each headline
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
import numpy as np
lr = LogisticRegression(maxIter = 10)

paramGrid_lr = ParamGridBuilder() \
    .addGrid(lr.regParam, np.linspace(0.3, 0.01, 10)) \
    .addGrid(lr.elasticNetParam, np.linspace(0.3, 0.8, 6)) \
    .build()
crossval_lr = CrossValidator(estimator=lr,
                          estimatorParamMaps=paramGrid_lr,
                          evaluator=MulticlassClassificationEvaluator(),
                          numFolds= 5)  
cvModel_lr = crossval_lr.fit(trainDF)
best_model_lr = cvModel_lr.bestModel.summary
best_model_lr.predictions.columns

Data Analysis

  1. Get date and ticker information from Stock News API or use alternative dataset
  2. Use Stock List from NASDAQ with key ticker information
  3. Run the functions notebook to clean and merge the financial sentiment classification, date, and detailed ticker information to perform analysis
  4. Save merged DataFrame as csv
  5. Open csv in Tableau, using template provided, view data visualization.

Slidedeck

Financial News Analysis

Team

Team Member Github username
Adriana Icasiano adriana-icasiano
Paul Feliciano pfeliciano1
Alberto Gonzalez dalismo
Abayomi Olujobi bay0624
Lovensky Lubin Lubinl

About

This project entailed using supervised machine learning on stock news to provide the sentiment on API data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5