Skip to content

Prashanna-Raj-Pandit/STAT-562-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STAT-562: Machine Learning

Author: Prashanna Raj Pandit

This repo contains all four major labs completed as part of the course “Machine Learning. Each lab focuses on a different machine learning technique, giving hands-on experience with supervised and unsupervised learning using R. STAT-562 is centered on applying classical statistical learning methods to real datasets. Across the four labs, we explore:

  • 🔹 Data preprocessing & feature engineering
  • 🔹 Classification models (k-NN, LDA, QDA, Naive Bayes)
  • 🔹 Unsupervised learning (hierarchical & K-means clustering)
  • 🔹 Ensemble methods (Bagging, Boosting, Random Forest)
  • 🔹 Model evaluation using accuracy, ROC, confusion matrices, RMSE
  • 🔹 Cross-validation & hyperparameter tuning using caret

Each project builds practical intuition and technical skills for applying statistical models to real-world data.


This project builds and evaluates multiple machine learning models to predict breast cancer (Cancer vs Control) using routine blood-based metabolic biomarkers and anthropometric measures instead of imaging or genetic tests.

Models compared:

  • Naive Bayes
  • Linear Discriminant Analysis (LDA)
  • k-NN (with tuned k)
  • Random Forest
  • Gradient Boosting
  • Support Vector Machine (SVM)
  • Deep Neural Network (DNN)

Model Performance on Breast cancer Classification.

Model Accuracy Sensitivity Specificity F1 Score AUC TP TN FP FN
Naive Bayes 0.70 0.77 0.60 0.74 0.70 10 6 4 3
LDA 0.78 0.85 0.70 0.82 0.80 11 7 3 2
KNN (k tuned) 0.78 0.77 0.80 0.80 0.81 10 8 2 3
Random Forest 0.87 0.85 0.90 0.88 0.91 11 9 1 2
Gradient Boosting 0.83 0.77 0.90 0.83 0.89 10 9 1 3
SVM 0.78 0.69 0.90 0.78 0.85 9 9 1 4
Deep NN 0.74 0.77 0.70 0.77 0.75 10 7 3 3

Table 1. Test-set performance for each model. Accuracy, Sensitivity (TPR), Specificity (TNR), F1, and AUC are shown, along with confusion matrix counts (TP, TN, FP, FN).

About

Implementation of Machine Learning algorithms using R and Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages