Skip to content

Maddy256/Stock-Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

INTRODUCTION :

The aim of this project is to develop a sentiment analysis model that predicts the movement of stock prices by analyzing textual data from various sources such as news articles, social media posts, and other financial news and opinions. The insights from the sentiment analysis can serve as valuable indicators for investor sentiment and market sentiment, aiding in making informed trading decisions.

OBJECTIVE :

The primary objective of this project is to develop a machine learning model that can analyze sentiment from textual data and predict stock price movements. The project will involve the following steps:

  • Data Collection
  • Data extraction (Web scrapping)
  • Data Preprocessing
  • Sentiment Analysis
  • Model Training and Evaluation
  • Prediction and Visualization

FLOW OF PROJECT

PRIMARY ANALYSIS :

Firstly i have imported the yahoo finance library and an object for MICROSOFT is created . The ‘HISTORY’ method is called to retrieve historical data of past 10 years .The data includes daily open, high, low, close prices, volume, and dividends.

OUTPUT:

Screenshot_3-7-2024_34447_

Now ,I have done small analysis of the Close Price of stocks which i will be scrapping the data of, namely Microsoft(MSFT), Nvidia(NVDA),Apple(AAPL)

Screenshot_3-7-2024_3457_

DATA COLLECTION:

I used the selenium to web scrap the dyanamic website of Financial times for the news headlines news data , this process includes: 1. Creating web driver from selenium :

Screenshot_3-7-2024_34525_

Then the code for web scrapping : Screenshot_3-7-2024_34536_ Screenshot_3-7-2024_34554_ Screenshot_3-7-2024_3469_

Screenshot_3-7-2024_34628_

Issue and the Solution

ISSUE:

News is not present after certain pages

SOLUTION:

I have added the print statement so that i can make sure that once the news on the page number x ends then we need to end the loop manually (a keyboard interrupt) , otherwise it will continue to search till the 800 pages (even if they dont have any data) . You can see what i mean by going the cell inputs. It shows data available till the time data is available and page ends when the page is ended. After the output continuously shows page ends that means no more data is available

Screenshot_3-7-2024_34644_

Cleaning the headlines data:

  • merging the headline for the same date into one.
  • deleted the headlines where date is not defined. Now i have gathered the stock price data, from yfinance, put the labels on each day (1 if the present day price is greater than previous day, else 0) and merged it with the headlines data, based on the date , so that i get the label (what was the stock price behaviour) and the corresponding headline for a particular date , in my final database , which can be used for training. Here is the final db: with text vs labe

Screenshot_3-7-2024_3478_

Screenshot_3-7-2024_34721_ Now implementing bag of words approach : Screenshot_3-7-2024_34738_ Calculated the sentiment score via NLP and vader sentiment analyser: Screenshot_3-7-2024_34751_

Screenshot_3-7-2024_3482_

Training the Model

Now our data is ready for training based on the compound sentiment score of the headline and the stock label, using this we train a random forest classifier (supervised) ,and then test it (initially we split it using test train classifer) . I trained with compound score as the features. Screenshot_3-7-2024_34814_ Screenshot_3-7-2024_34826_

The Confusion Matrix :

I got an accuracy of 55.17%

Screenshot_3-7-2024_34838_

NOW THE MODEL IS READY FOR USE :

FEED SENTIMENT - GET THE LABEL OF INCREASE / DECREASE We will test it by performing the trades by our portfolio on Tesla (TSLA) stocks.

First Step:

Getting the tesla stock headlines on which our model will work and predict the increase/ decrease of the tesla stocks: Following the same procedure for web scrap as above , we get the news and the model predicts the labels Screenshot_3-7-2024_34852_ Screenshot_3-7-2024_3491_

OUTPUT:

Screenshot_3-7-2024_34914_

Now I have defined a strategy

Screenshot_3-7-2024_34925_

Screenshot_3-7-2024_34941_

Screenshot_3-7-2024_34950_

  • It maintains a position variable to track whether the strategy is currently holding a position or not.
  • The strategy interprets a label shift from 0 to 1 as a buying opportunity, initiating a buy action if the strategy currently holds no position.
  • Upon detecting a buy signal, it records the corresponding closing price as the buy price and stores the date of the buy signal.
  • If a buy signal is detected and the strategy wasn't in a position, it calculates the number of stocks that can be bought based on the available investment amount and the closing price, initiating a buy action. I have taken my initial portfolio value to be $10,000

Screenshot_3-7-2024_3502_

Screenshot_3-7-2024_35014_

The graph of portfolio is shown at the last: Screenshot_3-7-2024_35026_

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published