Performed exploratory data analysis on covid tweets collected from 25-07-2020, with #covid19 involved.#6
Open
errpv78 wants to merge 1 commit intoksksksks-dev:mainfrom
Open
Performed exploratory data analysis on covid tweets collected from 25-07-2020, with #covid19 involved.#6errpv78 wants to merge 1 commit intoksksksks-dev:mainfrom
errpv78 wants to merge 1 commit intoksksksks-dev:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Approach
Opening the columns.txt and understanding the columns their types and description.
Loading the dataset and viewing its sample, dimensions, and a sample row.
Checking Frequency distribution and description in integer columns like user_followers, user_friends.
Checking for null and missing values and cleaning the data.
Exploring the unique values in columns their frequency and maximum lengths of columns to understand more about the distribution of data.
Filtering columns to get relevant information needed for better grouping and understanding data like adding tweet_date for date of tweet column from the date column which had both date and time to look for the frequency distribution of tweets date wise.
Plotting tweet lengths to see a variation of tweet lengths in words.
Building and training sentimental analysis Bert model, the link for separate training file included below.
Preprocessing the tweet text to filter out unnecessary words and characters and adding sentiment column to the data frame and saving data frame to a new CSV file.
Checking overall sentiment distribution among tweets.
Subsetting data frame with conditions on columns to understand distribution among columns.
Visualizing sentiments among top values and frequency distribution of sentiments with respect to other columns.
Exploring hashtags column to understand different hashtags and their relation with sentiments and other columns.