GitHub - dotSol0/long-term-hate-speech-classification-model: A code featuring a random forest hate speech classifier as well as a function that runs a csv file of tweets into said classifier and returns the amount of tweets in that file that were hateful, offensive, and non-problematic.

Credit to Kai_Nylund for sourcing the dataset used in this project! Link to Dataset: https://huggingface.co/datasets/KaiNylund/twitter-year-splits

How to replicate results:

Step 1: Download the test/train dataset linked in the folder

Step 2: Download the datasets labeled as (year)_test from the HuggingFace page.

Step 3: Upload the files on your Jupyter lab: the datasets mentioned and the notebook file. Upload the HuggingFace datasets with this format: "year"testdata.csv (for example, 2015testdata.csv, 2016testdata.csv, etc.)

Step 4: Run through the entire notebook. Read comments to get details on the process.

How to modify code/use different data:

Comments detail every step of the process. To change the classification algorithm, find the existing algorithms and replace the RandomForest (or LogisticRegression) with the algorithm of your choosing. To use different data for result gathering, follow the code format at the end to properly replicate the steps.

Good luck and happy data collecting!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Dataset Links		Dataset Links
HateClassifierModel.ipynb		HateClassifierModel.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

License

dotSol0/long-term-hate-speech-classification-model

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages