Skip to content

rexcoleman/vuln-prioritization-ml

Repository files navigation

EPSS Dominates: ML Vulnerability Prioritization Matches EPSS Using Only Public Data While CVSS Fails

YES — ML (AUC 0.903) outperforms CVSS (0.662) by +24pp. CVSS is a weak exploitability predictor. However, EPSS (0.912) slightly outperforms our ML model (0.903) — EPSS is already ML-based and trained on richer data.

Blog post: CVSS Gets It Wrong — EPSS and ML Predict Exploited Vulnerabilities 24pp Better

govML Quality License

Key Result

Key Results

Model AUC-ROC F1 vs CVSS
Logistic Regression 0.903 0.106 +24.1pp
Random Forest 0.864 0.000 +20.2pp
XGBoost 0.825 0.018 +16.3pp
Best CVSS Threshold (≥9.0) 0.662 0.021 baseline
Best EPSS Threshold (≥0.01) 0.912 0.054 +25.1pp
Random (majority class) N/A 0.000

Core insight: YES — ML (AUC 0.903) outperforms CVSS (0.662) by +24pp. CVSS is a weak exploitability predictor. However, EPSS (0.912) slightly outperforms our ML model (0.903) — EPSS is already ML-based and trained on richer data.

Quick Start

git clone https://github.com/rexcoleman/vuln-prioritization-ml
cd vuln-prioritization-ml
conda env create -f environment.yml
conda activate vuln-prioritize
bash reproduce.sh

Project Structure

FINDINGS.md # Research findings with pre-registered hypotheses and full results
HYPOTHESIS_REGISTRY.md # Hypothesis predictions, results, and verdicts
reproduce.sh # One-command reproduction of all experiments
governance.yaml # govML governance configuration
CITATION.cff # Citation metadata
LICENSE # MIT License
pyproject.toml # Python project configuration
environment.yml # Conda environment specification
scripts/ # Experiment and analysis scripts
tests/ # Test suite
outputs/ # Experiment outputs and results
figures/ # Generated figures and visualizations
data/ # Data files and datasets
docs/ # Documentation and decision records

Methodology

See FINDINGS.md for detailed methodology, pre-registered hypotheses, and full experimental results with multi-seed validation.

Limitations

  • Ground truth lag: ExploitDB labels 2024+ CVEs are incomplete — many exploited vulns haven't been added yet. This depresses test-set performance for all models.
  • No proprietary data: EPSS has access to threat intelligence feeds, social media, and exploit activity that our model doesn't. Apples-to-oranges comparison on data, fair comparison on methodology.
  • No TF-IDF features in final model: The structured features alone achieved 0.903 AUC. Adding TF-IDF is a stretch goal that may improve performance.
  • Single seed for Key Results table: RQ1/RQ3 main results show seed=42 LogReg (AUC 0.903). Learning curves and complexity sweeps are fully 5-seed validated, confirming this value has near-zero variance (0.903 +/- 0.000 at full data).
  • Fixed train/test split: All seeds use the same temporal split boundary (pre-2024 / 2024+). Variance estimates reflect model randomness, not split sensitivity. A true cross-validation would require multiple temporal boundaries.
  • Complexity sweeps are deterministic for XGBoost/LogReg: Both models produce identical results across seeds given the same data, so the 5-seed sweep confirms reproducibility but does not capture split-dependent uncertainty. RF is the only model with genuine multi-seed variance in the complexity analysis.
  • EPSS circularity: The model's top feature is EPSS percentile, which is itself an ML prediction. Without EPSS, AUC drops to ~0.68. The model is largely learning to weight EPSS. This is an honest negative result, not a flaw — it quantifies EPSS's contribution and demonstrates that public metadata alone provides modest but real signal above CVSS.

Citation

If you use this work, please cite using the metadata in CITATION.cff.

License

MIT 2026 Rex Coleman


Governed by govML v3.3

About

ML-driven vulnerability prioritization: predicting which CVEs get exploited. 4 research questions, SHAP explainability, 11 architectural decision records. govML-governed.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors