|
| 1 | +# 💡 Pearsonify |
| 2 | +## Probabilistic Classification with Conformalized Intervals |
| 3 | + |
| 4 | +**Pearsonify** is a lightweight 🐍 Python package for generating **classification intervals** around predicted probabilities in binary classification tasks. |
| 5 | + |
| 6 | +It uses **Pearson residuals** and **principles of conformal prediction** to quantify uncertainty without making strong distributional assumptions. |
| 7 | + |
| 8 | + |
| 9 | + |
| 10 | +### 🚀 Why Pearsonify? |
| 11 | + |
| 12 | +* 📊 **Intuitive Classification Intervals**: Get reliable intervals for binary classification predictions. |
| 13 | +* 🧠 **Statistically Grounded**: Uses Pearson residuals, a well-established metric from classical statistics. |
| 14 | +* ⚡ **Model-Agnostic**: Works with any model that provides probability estimates. |
| 15 | +* 🛠️ **Lightweight**: Minimal dependencies, easy to integrate into existing projects. |
| 16 | + |
| 17 | +### 📦 How to install? |
| 18 | + |
| 19 | +Use `pip` to install the package from GitHub: |
| 20 | + |
| 21 | +```bash |
| 22 | +pip install git+https://github.com/xRiskLab/pearsonify.git |
| 23 | +``` |
| 24 | + |
| 25 | +### 💻 How to use? |
| 26 | + |
| 27 | +```python |
| 28 | +import numpy as np |
| 29 | +from pearsonify import Pearsonify |
| 30 | +from sklearn.svm import SVC |
| 31 | +from sklearn.datasets import make_classification |
| 32 | +from sklearn.model_selection import train_test_split |
| 33 | + |
| 34 | +# Generate synthetic classification data |
| 35 | +np.random.seed(42) |
| 36 | +X, y = make_classification( |
| 37 | + n_samples=1000, n_features=20, n_informative=10, n_classes=2, random_state=42 |
| 38 | +) |
| 39 | + |
| 40 | +# Split data into train, calibration, and test sets |
| 41 | +X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4, random_state=42) |
| 42 | +X_cal, X_test, y_cal, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42) |
| 43 | + |
| 44 | +# Initialize Pearsonify with an SVC model |
| 45 | +clf = SVC(probability=True, random_state=42) |
| 46 | +model = Pearsonify(estimator=clf, alpha=0.05) |
| 47 | + |
| 48 | +# Fit the model on training and calibration sets |
| 49 | +model.fit(X_train, y_train, X_cal, y_cal) |
| 50 | + |
| 51 | +# Generate prediction intervals for test set |
| 52 | +y_test_pred_proba, lower_bounds, upper_bounds = model.predict_intervals(X_test) |
| 53 | + |
| 54 | +# Calculate coverage |
| 55 | +coverage = model.evaluate_coverage(y_test, lower_bounds, upper_bounds) |
| 56 | +print(f"Coverage: {coverage:.2%}") |
| 57 | + |
| 58 | +# Plot the intervals |
| 59 | +model.plot_intervals(y_test_pred_proba, lower_bounds, upper_bounds) |
| 60 | +``` |
| 61 | + |
| 62 | +Running `example.py` will generate the following plot: |
| 63 | + |
| 64 | + |
| 65 | + |
| 66 | +This plot shows predicted probabilities with 95% confidence intervals, sorted by prediction score. |
| 67 | + |
| 68 | +### 📖 References |
| 69 | + |
| 70 | +Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression. John Wiley & Sons. |
| 71 | + |
| 72 | +Tibshirani, R. (2023). Conformal Prediction. Advanced Topics in Statistical Learning, Spring 2023. |
| 73 | + |
| 74 | +### 📝 License |
| 75 | + |
| 76 | +This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |
0 commit comments