Skip to content

Latest commit

 

History

History
64 lines (44 loc) · 1.81 KB

File metadata and controls

64 lines (44 loc) · 1.81 KB

Decision Tree ROC Analysis

ROC curve comparison and hyperparameter tuning for Decision Tree classifiers on two OpenML datasets.

Overview

This project trains Decision Tree classifiers on two OpenML binary-classification datasets (IDs 4534 and 44), each with both the entropy and gini splitting criteria. For every configuration it generates an ROC curve via 10-fold cross-validated probability predictions, computes the AUC score, and runs a GridSearchCV over min_samples_leaf to find the best hyperparameter. Four ROC-curve plots are saved as PNG files.

Requirements

  • Python 3.8+
  • scikit-learn >= 1.0
  • matplotlib >= 3.5
  • numpy >= 1.21

Installation

pip install -r requirements.txt

Project Structure

decision-tree-roc-analysis/
├── plot_roc_curves.py      # ROC curve plotting and tuning script
├── requirements.txt        # Python dependencies
├── .gitignore
└── README.md

Usage

python plot_roc_curves.py

Steps performed:

  1. Downloads OpenML datasets 4534 and 44.
  2. Trains 4 Decision Tree configurations (2 datasets x 2 criteria: entropy, gini).
  3. Computes 10-fold cross-validated ROC curves and AUC scores.
  4. Saves each ROC plot as roc_curve_{dataset_id}_{criterion}.png.
  5. Runs GridSearchCV on min_samples_leaf and prints the best parameters for each configuration.

Results

Four PNG plots are produced in the working directory:

File Description
roc_curve_4534_entropy.png Dataset 4534, entropy criterion
roc_curve_4534_gini.png Dataset 4534, gini criterion
roc_curve_44_entropy.png Dataset 44, entropy criterion
roc_curve_44_gini.png Dataset 44, gini criterion

AUC values and best GridSearchCV parameters are printed to stdout.

Author

Biswajeet Sahoo

License

MIT License