This Python script pulls data from Twitter using the Twint library. The data is cleaned, then fed into a Naive Bayes classifier that was previously trained from a sample tweet dataset.
The sentiment score for each stock is calculated through a simple averaging algorithm. "Positive" = 1 "Negative" = -1 The scores for the number of tweets selected are added up and averaged. Positive average means overall positive sentiment and vice versa.
The accuracy of the sentiment scores is proportional to the number of tweets for each stock pulled.
For example: 1000 tweets/stock results in a sentiment score to the 1/1000th place.
- Improve time complexity by checking if the data exists beforehand.
- Modify classifier to output numerical sentiment scores rather than "Positive" or "Negative".
- Find a more complex algorithm to convert binary values into a decimal sentiment score.\
Twint tutorial from Medium
NLTK Sentiment Analysis tutorial from Digital Ocean