Reddit, the so-called “front page of the Internet,” is a social news aggregator where users post URLs to content along with a title, and other users can upvote or downvote the post. Votes determine the ranking of the posts, i.e., the order in which they are displayed on the site. There is also a threaded comments section for users to discuss a post, and comments are also subject to the voting system. Although users can mark each other as friends, the community structure is not defined by the friendship relation. Rather, communities on Reddit are formed via the “subreddit” concept. Users can create their own subreddits, choosing the topic as well as the moderation policy. This has led to a plethora of communities, ranging from video games to news and politics, pornography, and even meta-communities focusing on interactions people have in other subreddits. Reddit has done good things for the world, e.g., donation drives for charities, but is also not without controversial content. For example, “The Fappening” was a global event involving very questionable behavior from users all over the Web, and Reddit in particular. In August 2014, nude images were uploaded to subreddits created for the actress Jennifer Lawrence and model Kate Upton. Over a a short period of time, nude images, and even video of many other celebrities were uploaded. The Reddit admins battled with the community, a huge influx of traffic, and legal responsibility, eventually banning the subreddit dedicated to the leak. Unfortunately, the community revolted, and created new subreddits faster than the admins could ban them. Eventually things settled down and the criminals responsible for acquiring the images in the first place were arrested.
- Determine the level of hate speech (as defined by the phrases in the Hatebase dictionary) on Reddit in terms of: (a) Reddit submissions (b) Reddit comments (c) Individual subreddits Although it is not required, we suggest that you investigate the use of a stemming library to better facilitate the use of the Hatebase dictionary. For example, https://github. com/uttesh/exude can ensure that you catch both singular and plural versions of words.
- Present the level of hate speech in at least three ways. For example: (a) The N most hateful subreddits (b) The distribution of hateful comments per user (e.g., a CDF)