Skip to content

A code featuring a random forest hate speech classifier as well as a function that runs a csv file of tweets into said classifier and returns the amount of tweets in that file that were hateful, offensive, and non-problematic.

License

Notifications You must be signed in to change notification settings

dotSol0/long-term-hate-speech-classification-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Credit to Kai_Nylund for sourcing the dataset used in this project! Link to Dataset: https://huggingface.co/datasets/KaiNylund/twitter-year-splits

How to replicate results:

Step 1: Download the test/train dataset linked in the folder

Step 2: Download the datasets labeled as (year)_test from the HuggingFace page.

Step 3: Upload the files on your Jupyter lab: the datasets mentioned and the notebook file. Upload the HuggingFace datasets with this format: "year"testdata.csv (for example, 2015testdata.csv, 2016testdata.csv, etc.)

Step 4: Run through the entire notebook. Read comments to get details on the process.

How to modify code/use different data:

Comments detail every step of the process. To change the classification algorithm, find the existing algorithms and replace the RandomForest (or LogisticRegression) with the algorithm of your choosing. To use different data for result gathering, follow the code format at the end to properly replicate the steps.

Good luck and happy data collecting!

About

A code featuring a random forest hate speech classifier as well as a function that runs a csv file of tweets into said classifier and returns the amount of tweets in that file that were hateful, offensive, and non-problematic.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published