ROC curve comparison and hyperparameter tuning for Decision Tree classifiers on two OpenML datasets.
This project trains Decision Tree classifiers on two OpenML binary-classification datasets (IDs 4534 and 44), each with both the entropy and gini splitting criteria. For every configuration it generates an ROC curve via 10-fold cross-validated probability predictions, computes the AUC score, and runs a GridSearchCV over min_samples_leaf to find the best hyperparameter. Four ROC-curve plots are saved as PNG files.
- Python 3.8+
- scikit-learn >= 1.0
- matplotlib >= 3.5
- numpy >= 1.21
pip install -r requirements.txtdecision-tree-roc-analysis/
├── plot_roc_curves.py # ROC curve plotting and tuning script
├── requirements.txt # Python dependencies
├── .gitignore
└── README.md
python plot_roc_curves.pySteps performed:
- Downloads OpenML datasets 4534 and 44.
- Trains 4 Decision Tree configurations (2 datasets x 2 criteria: entropy, gini).
- Computes 10-fold cross-validated ROC curves and AUC scores.
- Saves each ROC plot as
roc_curve_{dataset_id}_{criterion}.png. - Runs GridSearchCV on
min_samples_leafand prints the best parameters for each configuration.
Four PNG plots are produced in the working directory:
| File | Description |
|---|---|
roc_curve_4534_entropy.png |
Dataset 4534, entropy criterion |
roc_curve_4534_gini.png |
Dataset 4534, gini criterion |
roc_curve_44_entropy.png |
Dataset 44, entropy criterion |
roc_curve_44_gini.png |
Dataset 44, gini criterion |
AUC values and best GridSearchCV parameters are printed to stdout.
Biswajeet Sahoo
MIT License