This repository contains my individual contribution to the final group assignment of the CAB420 Machine Learning course at Queensland University of Technology (QUT). While the broader project compared SVM, XGBoost, DCNN, and Vision Transformer models in a team setting, this repository focuses exclusively on the SVM system I built end-to-end.
Only the SVM model and related experiments are uploaded here.
Alzheimer’s Disease (AD) is a progressive neurodegenerative disorder where early diagnosis is critical.
MRI scans reveal structural brain changes years before clinical symptoms, but manual MRI analysis is time-consuming and subjective, which motivates the use of machine learning to assist in automated diagnosis.
Goal: Build an accurate, efficient, and reproducible classical ML pipeline to classify MRI slices into four Alzheimer’s disease stages.
How effective is a classical machine learning model (SVM) at classifying Alzheimer’s disease stages from brain MRI images, and how can its performance be optimised through feature engineering and dimensionality reduction?
Dataset: OASIS-1 (Source: Kaggle)
Modality: T1-weighted MRI (2D axial slices)
Image size: 128 × 128 (grayscale)
Classes:
- Non Demented
- Very Mild Demented
- Mild Demented
- Moderate Demented
- Explicit class balancing via over/under-sampling
- Deterministic splits using fixed random seed (42) for reproducibility
- Strict shape validation to prevent corrupted inputs
Data Splits (Balanced):
Split | Samples per Class | Total
Training | 4,000 | 16,000
Validation | 1,000 | 4,000
Test | 640 | 2,560
- Further Preprocessing
- Grayscale conversion
- Image flattening (128 × 128 → 16,384 features)
- Feature standardisation (StandardScaler)
- Label conversion (one-hot → integer)
- Feature Extraction using HOG
Histogram of Oriented Gradients (HOG) was used to extract structural patterns and texture features relevant to brain MRI.
Why HOG?
- Captures anatomical structure better than raw pixels
- Reduces noise sensitivity
- Well-suited to medical imaging + classical ML
HOG Parameters:
Orientations: 9
Pixels per cell: (8, 8)
Cells per block: (2, 2)
Block norm: L2-Hys
Feature size reduction:
Raw pixels: 16,384
After HOG: 8,100
- Dimensionality Reduction using PCA This was done to improve efficiency and generalisation:
- PCA retaining 95% variance
- Reduced feature space: 8,100 → 1,616 This step reduced computational cost without sacrificing accuracy.
- Model & Hyperparameter Tuning
Classifier: Support Vector Machine (SVM)
Kernel: RBF
Optimisation: GridSearchCV (5-fold cross-validation)
Best Parameters:
kernel = 'rbf'
C = 1
gamma = 'scale'
Model | Accuracy | Training Time | Evaluation Time
Baseline (flattened) | 99.53% | 908.07 s | 621.83 s
Baseline + HOG | 98.75% | 1163.37 s | 257.62 s
Baseline + HOG + PCA | 98.95% | 213.12 s | 48.18 s
Best Model: Baseline + HOG + PCA
- ~0.6% accuracy drop vs baseline
- 4× faster training (Reduced training time by >75%)
- 10× faster evaluation
- Strong generalisation (5-fold CV: 98.24% ± 0.31%) and no overfitting
Training accuracy: 99.81%
Validation accuracy: 98.85%
5-fold CV accuracy: 98.24% ± 0.31%
Consistent precision/recall across all classes
Low variance across folds indicates strong robustness and split independence.
Classical ML models like SVMs remain highly competitive when paired with strong feature engineering.
Furthermore, HOG + PCA significantly improves efficiency with minimal performance loss.
- Python
- NumPy, scikit-learn
- OpenCV
- Matplotlib, Seaborn
Note
This project was completed for an academic course.
Code is shared for learning, experimentation, and portfolio purposes.