Skip to content

mattiaZonelli/spam-filter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spam classifiers

The goal of the assignment is to write a spam filter using discriminative and generative classifiers. Use the Spambase dataset which already represents spam/ham messages through a bag-of-words representations through a dictionary of 48 highly discriminative words and 6 characters. The first 54 features correspond to word/symbols frequencies; we ignore features 55-57; feature 58 is the class label (1 spam/0 ham).

  • Perform SVM classification using linear, polynomial of degree 2, and RBF kernels over the TF/IDF representation. In order to use angular information only, it has been applied kernel transformation.
  • Classify the same data also through a Naive Bayes classifier for continuous inputs, modeling each feature with a Gaussian distribution.
  • Perform k-NN classification with k=5

For SVM and k-NN we use the functions provided by sklearn while we coded the Naive Bayes algorithm. Before applying SVM classifier we needed to use TF-IDF on the original dataset. image

Comparison between Naive Bayes and SVM with linear kernel image

Results for k-NN image

About

comparison of SVM, Naive Bayes classifier and k-NN as spam filter

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors