The aim of this project is to develop a sentiment analysis model that predicts the movement of stock prices by analyzing textual data from various sources such as news articles, social media posts, and other financial news and opinions. The insights from the sentiment analysis can serve as valuable indicators for investor sentiment and market sentiment, aiding in making informed trading decisions.
The primary objective of this project is to develop a machine learning model that can analyze sentiment from textual data and predict stock price movements. The project will involve the following steps:
- Data Collection
- Data extraction (Web scrapping)
- Data Preprocessing
- Sentiment Analysis
- Model Training and Evaluation
- Prediction and Visualization
Firstly i have imported the yahoo finance library and an object for MICROSOFT is created . The ‘HISTORY’ method is called to retrieve historical data of past 10 years .The data includes daily open, high, low, close prices, volume, and dividends.
Now ,I have done small analysis of the Close Price of stocks which i will be scrapping the data of, namely Microsoft(MSFT), Nvidia(NVDA),Apple(AAPL)
I used the selenium to web scrap the dyanamic website of Financial times for the news headlines news data , this process includes: 1. Creating web driver from selenium :
Then the code for web scrapping :

News is not present after certain pages
I have added the print statement so that i can make sure that once the news on the page number x ends then we need to end the loop manually (a keyboard interrupt) , otherwise it will continue to search till the 800 pages (even if they dont have any data) . You can see what i mean by going the cell inputs. It shows data available till the time data is available and page ends when the page is ended. After the output continuously shows page ends that means no more data is available
- merging the headline for the same date into one.
- deleted the headlines where date is not defined. Now i have gathered the stock price data, from yfinance, put the labels on each day (1 if the present day price is greater than previous day, else 0) and merged it with the headlines data, based on the date , so that i get the label (what was the stock price behaviour) and the corresponding headline for a particular date , in my final database , which can be used for training. Here is the final db: with text vs labe
Now implementing bag of words approach :
Calculated the sentiment score via NLP and vader sentiment
analyser:

Now our data is ready for training based on the compound
sentiment score of the headline and the stock label, using this
we train a random forest classifier (supervised) ,and then test
it (initially we split it using test train classifer) .
I trained with compound score as the features.

I got an accuracy of 55.17%
FEED SENTIMENT - GET THE LABEL OF INCREASE / DECREASE We will test it by performing the trades by our portfolio on Tesla (TSLA) stocks.
Getting the tesla stock headlines on which our model will work
and predict the increase/ decrease of the tesla stocks:
Following the same procedure for web scrap as above , we get
the news and the model predicts the labels

- It maintains a position variable to track whether the strategy is currently holding a position or not.
- The strategy interprets a label shift from 0 to 1 as a buying opportunity, initiating a buy action if the strategy currently holds no position.
- Upon detecting a buy signal, it records the corresponding closing price as the buy price and stores the date of the buy signal.
- If a buy signal is detected and the strategy wasn't in a position, it calculates the number of stocks that can be bought based on the available investment amount and the closing price, initiating a buy action. I have taken my initial portfolio value to be $10,000














