created_utc: the time (in seconds) when the comment was postedups: number of upvotes on the commentsubreddit_id: id of the specific subredditlink_id: id of the particular comment threadname: name of the commentscore_hidden: 1 if the score of the comment was hidden; 0 elseauthor_flair_css_class: CSS class for the comment flairauthor_flair_text: flair text for the commentid: id of the comment (basically the same as comment name)removal_reason: reason a comment was removed (eitherlegalorNone)gilded: the number of gilded tags (~ premium likes) on the commentdowns: number of downvotes on the commentarchived: if the thread was archived (no new comments, no new likes)author: author's reddit usernamescore: number of upvotesretrieved_on: The time (in seconds) when the comment was pulled to create the dataset.body: the comment itselfdistinguished: the type of user on the page. Eithermoderator,admin, orNone.edited: whether (1) or not (0) the comment has been editedcontroversiality: a Boolean indicating whether (1) or not (0) a comment is controversial -- i.e., popular comments that are getting closely the same amount of upvotes as downvotes.parent_id: the id of the comment that this comment was replying to.Noneif the comment is not a reply
time: time comment was postedtime_lapse: time since first comment on threadhour_of_comment: hour of day comment was postedweekday: day of week comment was postedis_flair: Whether or not there is flair text for the commentis_flair_css: Whether or not there is a CSS class for the comment flairdepth: depth of comment in threadparent_score: score of parent comment (NaN if comment doesn't have a parent)time_since_parent: time since parent comment was postedlinked_sr: subreddits linkecd d in the commentlinked_urls: urls linked in the commentno_of_linked_sr: number of subreddits mentioned in the commentno_of_linked_urls: number of urls linked in the commentsubjectivity: number of instances of "I"is_edited: whether or not the comment has been editedis_quoted: whether or not comment quotes anotherno_quoted: number of quotes in the commentsenti_neg: negative sentiment scoresenti_neu: neutral sentiment scoresenti_pos: positive sentiment scoresenti_comp: compound sentiment scoreword_counts: Number of words in the comment
Scraped
url: url of thread comment is onnum_comments: number of comments on thread comment is onover_18: Whether or not the thread has been marked as NSFWlink_score: upvotes of on thread comment is onselftext: selftext of thread if it existstitle: title of threadupvote_ratio: The percentage of upvotes from all votes on the threadlink_ups: number of upvotes on thread
Engineered from scraped data
link_created_time: time thread was createdtime_since_link: time since the thread was createdis_root: whether or the comment is a parentis_selftext: whether or not thread comment is on had selftextparent_cos_angle: consine similarity between comment and its parent comment's embeddingstitle_cos_angle: consine similarity between comment and its thread's title's embeddings
no_past_comments: number of comments on thread before this comment was postedscore_till_now: score of thread at the time this comment was posted