Toxic_Spans_Detection

Running the application is very straightforward. Upload the file to Google colab and upload the dataset of toxic spans to your google drive.
Set the variable 'data_source' to the location of the csv file in your Google drive (see mine as an example).

The code will run as-is, but feel free to adjust the variables in General options. The following can be adjusted:

These variables you probably shouldn't change, but can if you want.

use_lemmas - True if you want to use word lemmas rather than the word themselves. remove_extra_whitespace - True if you want to remove remove unnecessary and redundant whitespace that can mess things up remove_punctuation - True if you want to remove all punctuation except apostrophes remove_pronouns - True if you want to remove pronouns. I don't see how pronounons can be toxic and by using lemmas they'll be set to -PRON- anyways remove_stopwords - True if you want to remove stopwords.
make_lowercase - True to make the entire sentence lowercase. Good for processing, although caps can indicate keyboard-rage.

These variables can and should be played with

max_span_length - (int) this is for pre-processing. Some spans are ridiculously long and that's generally a sign of bad annotating. min_span_length - (int) this is for results. how toxic can 2 characters really be? W2V_window - (int) the window size in Word2Vec. W2V_size - (int) the number of values in the vector. Higher increases processing time.

And my personal favorite

classifier: (knn, sgd, random_forest) You can choose between 1 of 3 classifiers. SGD doesn't work well at all, but knn gets nice results. random_forest takes a long time to process. My suggestion is KNN.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Final_Toxic_Spans_Detection.ipynb		Final_Toxic_Spans_Detection.ipynb
README.md		README.md
tsd_trial.csv		tsd_trial.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxic_Spans_Detection

These variables you probably shouldn't change, but can if you want.

These variables can and should be played with

And my personal favorite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Toxic_Spans_Detection

These variables you probably shouldn't change, but can if you want.

These variables can and should be played with

And my personal favorite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages