INTRODUCTION :

The aim of this project is to develop a sentiment analysis model that predicts the movement of stock prices by analyzing textual data from various sources such as news articles, social media posts, and other financial news and opinions. The insights from the sentiment analysis can serve as valuable indicators for investor sentiment and market sentiment, aiding in making informed trading decisions.

OBJECTIVE :

The primary objective of this project is to develop a machine learning model that can analyze sentiment from textual data and predict stock price movements. The project will involve the following steps:

Data Collection
Data extraction (Web scrapping)
Data Preprocessing
Sentiment Analysis
Model Training and Evaluation
Prediction and Visualization

FLOW OF PROJECT

PRIMARY ANALYSIS :

Firstly i have imported the yahoo finance library and an object for MICROSOFT is created . The ‘HISTORY’ method is called to retrieve historical data of past 10 years .The data includes daily open, high, low, close prices, volume, and dividends.

OUTPUT:

Now ,I have done small analysis of the Close Price of stocks which i will be scrapping the data of, namely Microsoft(MSFT), Nvidia(NVDA),Apple(AAPL)

DATA COLLECTION:

I used the selenium to web scrap the dyanamic website of Financial times for the news headlines news data , this process includes: 1. Creating web driver from selenium :

Then the code for web scrapping :

Issue and the Solution

ISSUE:

News is not present after certain pages

SOLUTION:

I have added the print statement so that i can make sure that once the news on the page number x ends then we need to end the loop manually (a keyboard interrupt) , otherwise it will continue to search till the 800 pages (even if they dont have any data) . You can see what i mean by going the cell inputs. It shows data available till the time data is available and page ends when the page is ended. After the output continuously shows page ends that means no more data is available

Cleaning the headlines data:

merging the headline for the same date into one.
deleted the headlines where date is not defined. Now i have gathered the stock price data, from yfinance, put the labels on each day (1 if the present day price is greater than previous day, else 0) and merged it with the headlines data, based on the date , so that i get the label (what was the stock price behaviour) and the corresponding headline for a particular date , in my final database , which can be used for training. Here is the final db: with text vs labe

Now implementing bag of words approach : Calculated the sentiment score via NLP and vader sentiment analyser:

Training the Model

Now our data is ready for training based on the compound sentiment score of the headline and the stock label, using this we train a random forest classifier (supervised) ,and then test it (initially we split it using test train classifer) . I trained with compound score as the features.

The Confusion Matrix :

I got an accuracy of 55.17%

NOW THE MODEL IS READY FOR USE :

FEED SENTIMENT - GET THE LABEL OF INCREASE / DECREASE We will test it by performing the trades by our portfolio on Tesla (TSLA) stocks.

First Step:

Getting the tesla stock headlines on which our model will work and predict the increase/ decrease of the tesla stocks: Following the same procedure for web scrap as above , we get the news and the model predicts the labels

OUTPUT:

Now I have defined a strategy

It maintains a position variable to track whether the strategy is currently holding a position or not.
The strategy interprets a label shift from 0 to 1 as a buying opportunity, initiating a buy action if the strategy currently holds no position.
Upon detecting a buy signal, it records the corresponding closing price as the buy price and stores the date of the buy signal.
If a buy signal is detected and the strategy wasn't in a position, it calculates the number of stocks that can be bought based on the available investment amount and the closing price, initiating a buy action. I have taken my initial portfolio value to be $10,000

The graph of portfolio is shown at the last:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

INTRODUCTION :

OBJECTIVE :

FLOW OF PROJECT

PRIMARY ANALYSIS :

OUTPUT:

DATA COLLECTION:

Issue and the Solution

ISSUE:

SOLUTION:

Cleaning the headlines data:

Training the Model

The Confusion Matrix :

NOW THE MODEL IS READY FOR USE :

First Step:

OUTPUT:

Now I have defined a strategy

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
Screenshot_3-7-2024_34447_.jpeg		Screenshot_3-7-2024_34447_.jpeg
Screenshot_3-7-2024_34525_.jpeg		Screenshot_3-7-2024_34525_.jpeg
Screenshot_3-7-2024_34536_.jpeg		Screenshot_3-7-2024_34536_.jpeg
Screenshot_3-7-2024_34554_.jpeg		Screenshot_3-7-2024_34554_.jpeg
Screenshot_3-7-2024_3457_.jpeg		Screenshot_3-7-2024_3457_.jpeg
Screenshot_3-7-2024_34628_.jpeg		Screenshot_3-7-2024_34628_.jpeg
Screenshot_3-7-2024_34644_.jpeg		Screenshot_3-7-2024_34644_.jpeg
Screenshot_3-7-2024_3469_.jpeg		Screenshot_3-7-2024_3469_.jpeg
Screenshot_3-7-2024_34721_.jpeg		Screenshot_3-7-2024_34721_.jpeg
Screenshot_3-7-2024_34738_.jpeg		Screenshot_3-7-2024_34738_.jpeg
Screenshot_3-7-2024_34751_.jpeg		Screenshot_3-7-2024_34751_.jpeg
Screenshot_3-7-2024_3478_.jpeg		Screenshot_3-7-2024_3478_.jpeg
Screenshot_3-7-2024_34814_.jpeg		Screenshot_3-7-2024_34814_.jpeg
Screenshot_3-7-2024_34826_.jpeg		Screenshot_3-7-2024_34826_.jpeg
Screenshot_3-7-2024_3482_.jpeg		Screenshot_3-7-2024_3482_.jpeg
Screenshot_3-7-2024_34838_.jpeg		Screenshot_3-7-2024_34838_.jpeg
Screenshot_3-7-2024_34852_.jpeg		Screenshot_3-7-2024_34852_.jpeg
Screenshot_3-7-2024_34914_.jpeg		Screenshot_3-7-2024_34914_.jpeg
Screenshot_3-7-2024_3491_.jpeg		Screenshot_3-7-2024_3491_.jpeg
Screenshot_3-7-2024_34925_.jpeg		Screenshot_3-7-2024_34925_.jpeg
Screenshot_3-7-2024_34941_.jpeg		Screenshot_3-7-2024_34941_.jpeg
Screenshot_3-7-2024_34950_.jpeg		Screenshot_3-7-2024_34950_.jpeg
Screenshot_3-7-2024_35014_.jpeg		Screenshot_3-7-2024_35014_.jpeg
Screenshot_3-7-2024_35026_.jpeg		Screenshot_3-7-2024_35026_.jpeg
Screenshot_3-7-2024_3502_.jpeg		Screenshot_3-7-2024_3502_.jpeg
project.ipynb		project.ipynb

Maddy256/Stock-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

INTRODUCTION :

OBJECTIVE :

FLOW OF PROJECT

PRIMARY ANALYSIS :

OUTPUT:

DATA COLLECTION:

Issue and the Solution

ISSUE:

SOLUTION:

Cleaning the headlines data:

Training the Model

The Confusion Matrix :

NOW THE MODEL IS READY FOR USE :

First Step:

OUTPUT:

Now I have defined a strategy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages