Skip to content

Commit 69ed031

Browse files
committed
v2.0.5
1 parent 1a1a425 commit 69ed031

13 files changed

Lines changed: 1828 additions & 77 deletions

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
.DS_Store
22
**/.DS_Store/
33
.pytest_cache/
4+
.ruff_cache/
45
**/.pytest_cache/
56
__pycache__/
67
**/__pycache__/
@@ -12,4 +13,5 @@ __pycache__/
1213

1314
# Hide egg-info
1415
*.egg-info/
15-
dist/
16+
dist/
17+
.env

README.md

Lines changed: 92 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Fisher Scoring with Python
22

33
**Author:** [xRiskLab](https://github.com/xRiskLab)<br>
4-
**Version:** v2.0.4<br>
4+
**Version:** v2.0.5<br>
55
**License:** [MIT License](https://opensource.org/licenses/MIT) (2025)
66

77
![Title](https://github.com/xRiskLab/fisher-scoring/raw/main/docs/images/title.png)
@@ -10,15 +10,27 @@ This repository contains optimized Python implementations of the Fisher Scoring
1010

1111
```python
1212
%pip install fisher-scoring
13-
from fisher_scoring import LogisticRegression
13+
from fisher_scoring import LogisticRegression, RobustLogisticRegression, PoissonRegression
1414

15-
# Initialize and fit model
15+
# Binary Classification
1616
model = LogisticRegression()
1717
model.fit(X_train, y_train)
18-
19-
# Make predictions
2018
predictions = model.predict(X_test)
21-
probabilities = model.predict_proba(X_test)
19+
model.display_summary() # Rich formatted output
20+
21+
# Robust Classification (outlier-resistant)
22+
robust_model = RobustLogisticRegression(epsilon_contamination=0.05)
23+
robust_model.fit(X_train_contaminated, y_train_contaminated)
24+
robust_model.display_summary() # Rich formatted output with robustness metrics
25+
26+
# Count Data with Rate Modeling
27+
import numpy as np
28+
exposure_times = np.random.uniform(0.5, 3.0, len(y_train))
29+
offset = np.log(exposure_times) # Log exposure for rate modeling
30+
31+
poisson_model = PoissonRegression(offset=offset, information="empirical")
32+
poisson_model.fit(X_train, y_train)
33+
poisson_model.display_summary() # Rich formatted output
2234
```
2335

2436
## Overview
@@ -27,12 +39,13 @@ probabilities = model.predict_proba(X_test)
2739

2840
This repository contains a Python package with scikit-learn compatible implementations of the Fisher Scoring algorithm for various modeling problems.
2941

30-
The packages provides implementations of logistic regression (MLE for binary, multiclass, and binary imbalanced) for proportions (risk or prevalence) and Poisson and Negative Binomial regression for log-linear regression for incidence rates.
42+
The packages provides implementations of logistic regression (MLE for binary, multiclass, and binary imbalanced) for proportions (risk or prevalence), robust logistic regression for outlier-resistant classification, and Poisson and Negative Binomial regression for log-linear regression for incidence rates.
3143

3244
1. Binary classification problems: **Logistic Regression**.
33-
2. Multi-class classification problems: **Multinomial Logistic Regression**.
34-
3. Imbalanced classification problems: **Focal Loss Logistic Regression**.
35-
4. Count modeling problems: **Poisson Regression** and **Negative Binomial Regression**.
45+
2. Robust binary classification problems: **Robust Logistic Regression**.
46+
3. Multi-class classification problems: **Multinomial Logistic Regression**.
47+
4. Imbalanced classification problems: **Focal Loss Logistic Regression**.
48+
5. Count modeling problems: **Poisson Regression** and **Negative Binomial Regression**.
3649

3750
### Fisher Scoring Algorithm
3851

@@ -81,6 +94,36 @@ The `LogisticRegression` class is a custom implementation of logistic regression
8194
- `summary()`: Get a summary of model parameters, standard errors, p-values, and confidence intervals.
8295
- `display_summary()`: Display a summary of model parameters, standard errors, p-values, and confidence intervals.
8396

97+
### Robust Logistic Regression
98+
99+
The `RobustLogisticRegression` class implements robust logistic regression using the Fisher scoring algorithm with epsilon-contamination for outlier resistance. This method down-weights observations that are unlikely under the main model, providing robustness against data contamination and outliers.
100+
101+
**Parameters:**
102+
- `epsilon_contamination`: Contamination level (0 ≤ ε ≤ 1). Higher values provide more robustness but may reduce efficiency (default: 0.05).
103+
- `contamination_prob`: Probability for contamination distribution (default: 0.5).
104+
- `tol`: Convergence tolerance for parameter updates.
105+
- `max_iter`: Maximum number of iterations for the algorithm.
106+
- `information`: Type of information matrix to use ('expected' or 'empirical').
107+
- `use_bias`: Include a bias term in the model.
108+
- `significance`: Significance level for computing confidence intervals.
109+
110+
**Methods:**
111+
- `fit(X, y)`: Fit the robust model to the data with automatic outlier down-weighting.
112+
- `predict(X)`: Predict target labels for input data.
113+
- `predict_proba(X)`: Predict class probabilities for input data.
114+
- `predict_ci(X)`: Predict class probabilities with confidence intervals.
115+
- `get_params()`: Get model parameters.
116+
- `set_params(**params)`: Set model parameters.
117+
- `summary()`: Get a summary of model parameters, standard errors, p-values, confidence intervals, and robust weights.
118+
- `display_summary()`: Display a comprehensive summary including robustness metrics (epsilon contamination, average/minimum robust weights).
119+
120+
**Key Features:**
121+
- **Outlier Resistance**: Automatic down-weighting of observations unlikely under the main model.
122+
- **Robust Weights**: Access to individual observation weights showing outlier identification.
123+
- **Fisher Scoring Framework**: Consistent with other models using both expected and empirical information matrices.
124+
- **Statistical Inference**: Complete inference statistics with robust standard errors and confidence intervals.
125+
- **Rich Output**: Beautiful formatted summaries with robust-specific metrics and diagnostics.
126+
84127
### Multinomial Logistic Regression
85128

86129
The `MultinomialLogisticRegression` class implements the Fisher Scoring algorithm for multinomial logistic regression, suitable for multi-class classification tasks.
@@ -127,34 +170,58 @@ The `FocalLossRegression` class implements the Fisher Scoring algorithm with foc
127170

128171
### Poisson Regression
129172

130-
The `PoissonRegression` class implements the Fisher Scoring algorithm for Poisson regression, suitable for modeling count data.
173+
The `PoissonRegression` class implements the Fisher Scoring algorithm for Poisson regression, suitable for modeling count data and incidence rates. Features robust matrix operations with automatic fallback to pseudo-inverse for numerical stability.
131174

132175
**Parameters:**
133176
- `max_iter`: Maximum number of iterations for optimization.
134177
- `epsilon`: Convergence tolerance.
135178
- `use_bias`: Whether to include an intercept term.
179+
- `offset`: Offset term for rate modeling (e.g., log exposure times).
180+
- `significance`: Significance level for confidence intervals.
181+
- `information`: Type of information matrix to use ('expected' or 'empirical').
136182

137183
**Methods:**
138184
- `fit(X, y)`: Fit the model to the data.
139-
- `predict(X)`: Predict mean values for the Poisson model.
185+
- `predict(X, offset=None)`: Predict mean values with optional custom offset.
140186
- `calculate_st_errors(X)`: Calculate standard errors for the coefficients.
187+
- `summary()`: Get comprehensive model statistics including coefficients, standard errors, p-values, and confidence intervals.
188+
- `display_summary()`: Display beautiful formatted summary with Rich styling.
189+
190+
**Key Features:**
191+
- **Offset Support**: Full support for rate modeling with log exposure times.
192+
- **Information Matrix Choice**: Both expected and empirical Fisher information matrices supported.
193+
- **Robust Implementation**: Safe matrix inversion with automatic pseudo-inverse fallback.
194+
- **Statistical Summaries**: Complete inference statistics with Wald tests and confidence intervals.
195+
- **Validated Accuracy**: Mathematical correctness verified against statsmodels with machine precision accuracy.
141196

142197
### Negative Binomial Regression
143198

144-
The `NegativeBinomialRegression` class implements the Fisher Scoring algorithm for Negative Binomial regression, suitable for overdispersed count data.
199+
The `NegativeBinomialRegression` class implements the Fisher Scoring algorithm for Negative Binomial regression, suitable for overdispersed count data. Features enhanced robustness with comprehensive statistical inference and fixed critical implementation bugs.
145200

146201
**Parameters:**
147202
- `max_iter`: Maximum number of iterations for optimization.
148203
- `epsilon`: Convergence tolerance.
149204
- `use_bias`: Whether to include an intercept term.
150-
- `alpha`: Fixed dispersion parameter (overdispersion adjustment for Negative Binomial).
205+
- `alpha`: Fixed dispersion parameter (overdispersion adjustment).
151206
- `phi`: Constant scale parameter.
152207
- `offset`: Offset term for the linear predictor.
208+
- `significance`: Significance level for confidence intervals.
209+
- `information`: Type of information matrix to use ('expected' or 'empirical').
153210

154211
**Methods:**
155212
- `fit(X, y)`: Fit the model to the data.
156-
- `predict(X)`: Predict mean values for the Negative Binomial model.
157-
- `calculate_st_errors(X)`: Calculate standard errors for the coefficients.
213+
- `predict(X, offset=None)`: Predict mean values with proper offset handling.
214+
- `calculate_st_errors(X)`: Calculate standard errors with corrected implementation.
215+
- `summary()`: Get comprehensive model statistics including coefficients, standard errors, p-values, and confidence intervals.
216+
- `display_summary()`: Display beautiful formatted summary with Rich styling.
217+
218+
**Key Improvements:**
219+
- **Fisher Scoring Conversion**: Converted from IWLS to proper Fisher scoring for consistency.
220+
- **Information Matrix Choice**: Both expected and empirical Fisher information matrices supported (empirical recommended for numerical stability).
221+
- **Bug Fixes**: Fixed missing offset in prediction and standard error calculations.
222+
- **Robust Implementation**: Safe matrix inversion with automatic pseudo-inverse fallback.
223+
- **Statistical Summaries**: Complete inference statistics with Wald tests and confidence intervals.
224+
- **Enhanced Reliability**: Comprehensive testing ensures mathematical correctness.
158225

159226
## Utilities
160227

@@ -176,6 +243,15 @@ The package includes a utility function for visualizing observed vs predicted pr
176243

177244
## Change Log
178245

246+
- **v2.0.5**
247+
- **New**: Added `RobustLogisticRegression` class with epsilon-contamination for outlier-resistant classification.
248+
- **Enhanced**: Poisson and Negative Binomial regression with empirical Fisher information matrix support.
249+
- **Enhanced**: Converted Negative Binomial from IWLS to proper Fisher scoring for consistency.
250+
- **Added**: Comprehensive offset support for Poisson regression rate modeling.
251+
- **Fixed**: Critical bugs in Negative Binomial prediction and standard error calculations.
252+
- **Added**: `summary()` and `display_summary()` methods with rich statistical output.
253+
- **Validated**: Mathematical correctness verified against statsmodels with machine precision accuracy.
254+
179255
- **v2.0.4**
180256
- Added a beta version of Poisson and Negative Binomial regression using Fisher Scoring.
181257
- Changed naming conventions for simplicity and consistency.

pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[project]
22
name = "fisher-scoring"
3-
version = "2.0.4"
4-
description = "A Python implementation of the Fisher Scoring algorithm for classification and incidence rate tasks."
3+
version = "2.0.5"
4+
description = "A Python implementation of the Fisher Scoring algorithm for proportion and incidence rate modeling."
55
authors = [
66
{ name = "xRiskLab", email = "contact@xrisklab.ai" }
77
]

src/fisher_scoring/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
from .fisher_scoring_logistic import LogisticRegression
55
from .fisher_scoring_multinomial import MultinomialLogisticRegression
66
from .fisher_scoring_poisson import NegativeBinomialRegression, PoissonRegression
7+
from .fisher_scoring_robust import RobustLogisticRegression
78

89
# Set up logging
910
logging.basicConfig(level=logging.WARNING)
@@ -41,6 +42,7 @@ def __init__(self, *args, **kwargs):
4142
"FocalLossRegression",
4243
"PoissonRegression",
4344
"NegativeBinomialRegression",
45+
"RobustLogisticRegression",
4446
]
4547

4648
# Add dynamic version retrieval

src/fisher_scoring/fisher_scoring_focal.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ def invert_matrix(matrix: np.ndarray) -> np.ndarray:
133133
print("WARNING: Singular matrix. Using pseudo-inverse.")
134134
return np.linalg.pinv(matrix)
135135

136-
def fit(self, X: np.ndarray, y: np.ndarray) -> "FisherScoringFocalRegression":
136+
def fit(self, X: np.ndarray, y: np.ndarray) -> FocalLossRegression:
137137
"""
138138
Fit the focal logistic regression model using Fisher scoring.
139139
"""
@@ -168,7 +168,7 @@ def fit(self, X: np.ndarray, y: np.ndarray) -> "FisherScoringFocalRegression":
168168
# Expected Fisher Information matrix
169169
W_diag = (p * (1 - p) * pt).ravel()
170170
information_matrix = (X.T * W_diag) @ X
171-
else:
171+
elif self.information == "empirical":
172172
# Empirical Fisher Information matrix
173173
score_vector = (y - p).reshape(X.shape[0], 1, 1)
174174
X_vector = X.reshape(X.shape[0], -1, 1)
@@ -180,6 +180,11 @@ def fit(self, X: np.ndarray, y: np.ndarray) -> "FisherScoringFocalRegression":
180180
* pt.reshape(-1, 1, 1),
181181
axis=0,
182182
)
183+
else:
184+
raise ValueError(
185+
f"Unknown Fisher Information type: {self.information}. Use 'expected' or 'empirical'."
186+
)
187+
183188
self.information_matrix["iteration"].append(iteration)
184189
self.information_matrix["information"].append(information_matrix)
185190

@@ -333,7 +338,7 @@ def display_summary(self, style="default") -> None:
333338
summary_dict = self.summary()
334339

335340
total_iterations = len(self.information_matrix["iteration"])
336-
table = Table(title="Fisher Scoring Focal Logistic Regression Summary")
341+
table = Table(title="Fisher Scoring Focal Loss Logistic Regression Summary")
337342

338343
table.add_column(
339344
"Parameter",

src/fisher_scoring/fisher_scoring_logistic.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ def fit(
147147
# Expected Fisher Information matrix
148148
W_diag = (p * (1 - p)).ravel()
149149
information_matrix = (X.T * W_diag) @ X
150-
else:
150+
elif self.information == "empirical":
151151
# Empirical Fisher Information matrix
152152
score_vector = (y - p).reshape(X.shape[0], 1, 1)
153153
X_vector = X.reshape(X.shape[0], -1, 1)
@@ -158,6 +158,10 @@ def fit(
158158
@ X_vector.transpose(0, 2, 1),
159159
axis=0,
160160
)
161+
else:
162+
raise ValueError(
163+
f"Unknown Fisher Information type: {self.information}. Use 'expected' or 'empirical'."
164+
)
161165

162166
self.information_matrix["iteration"].append(iteration)
163167
self.information_matrix["information"].append(information_matrix)

src/fisher_scoring/fisher_scoring_multinomial.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,6 @@
1313
algorithm that is used to estimate the parameters of a multinomial logistic
1414
regression model.
1515
16-
The algorithm is based on the Newton-Raphson method and uses the expected or
17-
empirical Fisher information matrix to update the model parameters.
18-
1916
Additionally we provide a method to compute the standard errors, Wald statistic,
2017
p-values, and confidence intervals for each class.
2118
@@ -142,7 +139,7 @@ def fit(
142139
# Expected Fisher Information matrix
143140
W_diag = (p * (1 - p)).sum(axis=1)
144141
expected_I = (X.T * W_diag) @ X
145-
else:
142+
elif self.information == "empirical":
146143
# Empirical Fisher Information matrix
147144
score_vector = (y_one_hot - p).reshape(X.shape[0], -1, 1)
148145
X_vector = X.reshape(X.shape[0], -1, 1)
@@ -153,6 +150,10 @@ def fit(
153150
@ X_vector.transpose(0, 2, 1),
154151
axis=0,
155152
)
153+
else:
154+
raise ValueError(
155+
f"Unknown Fisher Information type: {self.information}. Use 'expected' or 'empirical'."
156+
)
156157

157158
# Select information matrix based on expected or empirical
158159
information_matrix = (

0 commit comments

Comments
 (0)