Skip to content

Latest commit

 

History

History

README.md

Anomaly Detection

Overview

This project implements anomaly detection using Gaussian distribution models. It identifies potentially faulty servers by analyzing latency and throughput statistics, and extends to higher-dimensional data.

Algorithm

  1. Estimate Gaussian parameters: Compute mean (mu) and variance (sigma^2) for each feature
  2. Compute probabilities: For each data point, compute p(x) = prod(N(x_i; mu_i, sigma_i^2))
  3. Select threshold (epsilon): Using F1 score on a cross-validation set, find the epsilon that best separates normal from anomalous points
  4. Flag anomalies: Points where p(x) < epsilon are flagged as anomalous

Files

File Description
sample8.m Main script: anomaly detection on server data
estimateGaussian.m Estimates mu and sigma^2 from data
multivariateGaussian.m Computes multivariate Gaussian probability
selectThreshold.m Selects epsilon using F1 score
visualizeFit.m Visualizes the Gaussian fit with contours
ex8data1.mat 2D server statistics dataset
ex8data2.mat 11D server statistics dataset

Key Results

  • 2D dataset: Best epsilon = 8.99e-05, F1 score = 0.875
  • 11D dataset: Best epsilon = 1.38e-18, F1 score = 0.615, with 117 anomalies found

Visualization

Anomaly Detection Visualization

Left: Gaussian contours with anomalies circled in red. Right: F1 score vs. epsilon threshold for optimal threshold selection.

Credit

Exercises from Andrew Ng's Machine Learning course on Coursera, completed by Keivan Hassani Monfared.