Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Support Vector Machines

Overview

This project implements Support Vector Machines (SVMs) for both linear and non-linear classification. It also builds a spam email classifier using SVMs with text preprocessing.

Algorithm

SVM Classification

SVMs find the optimal separating hyperplane that maximizes the margin between classes. Key components:

  • Linear kernel: For linearly separable data
  • Gaussian (RBF) kernel: K(x1, x2) = exp(-||x1-x2||^2 / (2*sigma^2)) for non-linear boundaries
  • C parameter: Controls the penalty for misclassification (analogous to 1/lambda)

Spam Classification

Emails are preprocessed (lowercasing, URL normalization, stemming, etc.) and converted to feature vectors. A linear SVM is trained on these features to classify spam vs. non-spam.

Files

File Description
sample6.m Main script: SVM with linear and Gaussian kernels
sample6_spam.m Main script: spam email classification
svmTrain.m SVM training using SMO algorithm
svmPredict.m SVM prediction
gaussianKernel.m Gaussian (RBF) kernel function
linearKernel.m Linear kernel function
dataset3Params.m Cross-validation for C and sigma selection
visualizeBoundary.m Plots non-linear decision boundary
visualizeBoundaryLinear.m Plots linear decision boundary
processEmail.m Email text preprocessing
emailFeatures.m Converts word indices to feature vector
getVocabList.m Loads the vocabulary list
porterStemmer.m Porter stemming algorithm
readFile.m Reads file contents
ex6data[1-3].mat 2D classification datasets
spamTrain.mat, spamTest.mat Spam classification datasets
vocab.txt Vocabulary list (1899 words)
emailSample[1-2].txt Sample legitimate emails
spamSample[1-2].txt Sample spam emails

Key Results

  • Linear SVM: Correctly separates linearly separable data
  • Gaussian SVM: Achieves non-linear decision boundaries for complex datasets
  • Spam Classifier: Training accuracy: 99.85%, Test accuracy: 98.9%
  • Top spam indicators: "our", "click", "remov", "guarante", "visit"

Visualization

SVM Visualization

Left: Linear SVM with margin boundaries. Center: Non-linear RBF kernel SVM. Right: Gaussian kernel function for different sigma values.

Credit

Exercises from Andrew Ng's Machine Learning course on Coursera, completed by Keivan Hassani Monfared.