EPSS Dominates: ML Vulnerability Prioritization Matches EPSS Using Only Public Data While CVSS Fails
YES — ML (AUC 0.903) outperforms CVSS (0.662) by +24pp. CVSS is a weak exploitability predictor. However, EPSS (0.912) slightly outperforms our ML model (0.903) — EPSS is already ML-based and trained on richer data.
Blog post: CVSS Gets It Wrong — EPSS and ML Predict Exploited Vulnerabilities 24pp Better
| Model | AUC-ROC | F1 | vs CVSS |
|---|---|---|---|
| Logistic Regression | 0.903 | 0.106 | +24.1pp |
| Random Forest | 0.864 | 0.000 | +20.2pp |
| XGBoost | 0.825 | 0.018 | +16.3pp |
| Best CVSS Threshold (≥9.0) | 0.662 | 0.021 | baseline |
| Best EPSS Threshold (≥0.01) | 0.912 | 0.054 | +25.1pp |
| Random (majority class) | N/A | 0.000 | — |
Core insight: YES — ML (AUC 0.903) outperforms CVSS (0.662) by +24pp. CVSS is a weak exploitability predictor. However, EPSS (0.912) slightly outperforms our ML model (0.903) — EPSS is already ML-based and trained on richer data.
git clone https://github.com/rexcoleman/vuln-prioritization-ml
cd vuln-prioritization-ml
conda env create -f environment.yml
conda activate vuln-prioritize
bash reproduce.shFINDINGS.md # Research findings with pre-registered hypotheses and full results
HYPOTHESIS_REGISTRY.md # Hypothesis predictions, results, and verdicts
reproduce.sh # One-command reproduction of all experiments
governance.yaml # govML governance configuration
CITATION.cff # Citation metadata
LICENSE # MIT License
pyproject.toml # Python project configuration
environment.yml # Conda environment specification
scripts/ # Experiment and analysis scripts
tests/ # Test suite
outputs/ # Experiment outputs and results
figures/ # Generated figures and visualizations
data/ # Data files and datasets
docs/ # Documentation and decision records
See FINDINGS.md for detailed methodology, pre-registered hypotheses, and full experimental results with multi-seed validation.
- Ground truth lag: ExploitDB labels 2024+ CVEs are incomplete — many exploited vulns haven't been added yet. This depresses test-set performance for all models.
- No proprietary data: EPSS has access to threat intelligence feeds, social media, and exploit activity that our model doesn't. Apples-to-oranges comparison on data, fair comparison on methodology.
- No TF-IDF features in final model: The structured features alone achieved 0.903 AUC. Adding TF-IDF is a stretch goal that may improve performance.
- Single seed for Key Results table: RQ1/RQ3 main results show seed=42 LogReg (AUC 0.903). Learning curves and complexity sweeps are fully 5-seed validated, confirming this value has near-zero variance (0.903 +/- 0.000 at full data).
- Fixed train/test split: All seeds use the same temporal split boundary (pre-2024 / 2024+). Variance estimates reflect model randomness, not split sensitivity. A true cross-validation would require multiple temporal boundaries.
- Complexity sweeps are deterministic for XGBoost/LogReg: Both models produce identical results across seeds given the same data, so the 5-seed sweep confirms reproducibility but does not capture split-dependent uncertainty. RF is the only model with genuine multi-seed variance in the complexity analysis.
- EPSS circularity: The model's top feature is EPSS percentile, which is itself an ML prediction. Without EPSS, AUC drops to ~0.68. The model is largely learning to weight EPSS. This is an honest negative result, not a flaw — it quantifies EPSS's contribution and demonstrates that public metadata alone provides modest but real signal above CVSS.
If you use this work, please cite using the metadata in CITATION.cff.
MIT 2026 Rex Coleman
Governed by govML v3.3
