Spam classifiers

The goal of the assignment is to write a spam filter using discriminative and generative classifiers. Use the Spambase dataset which already represents spam/ham messages through a bag-of-words representations through a dictionary of 48 highly discriminative words and 6 characters. The first 54 features correspond to word/symbols frequencies; we ignore features 55-57; feature 58 is the class label (1 spam/0 ham).

Perform SVM classification using linear, polynomial of degree 2, and RBF kernels over the TF/IDF representation. In order to use angular information only, it has been applied kernel transformation.
Classify the same data also through a Naive Bayes classifier for continuous inputs, modeling each feature with a Gaussian distribution.
Perform k-NN classification with k=5

For SVM and k-NN we use the functions provided by sklearn while we coded the Naive Bayes algorithm. Before applying SVM classifier we needed to use TF-IDF on the original dataset.

Comparison between Naive Bayes and SVM with linear kernel

Results for k-NN

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
spam_filter		spam_filter
analysis.R		analysis.R
readme.md		readme.md
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam classifiers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spam classifiers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages