You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository contains a Python package with scikit-learn compatible implementations of the Fisher Scoring algorithm for various modeling problems.
29
41
30
-
The packages provides implementations of logistic regression (MLE for binary, multiclass, and binary imbalanced) for proportions (risk or prevalence) and Poisson and Negative Binomial regression for log-linear regression for incidence rates.
42
+
The packages provides implementations of logistic regression (MLE for binary, multiclass, and binary imbalanced) for proportions (risk or prevalence), robust logistic regression for outlier-resistant classification, and Poisson and Negative Binomial regression for log-linear regression for incidence rates.
4. Imbalanced classification problems: **Focal Loss Logistic Regression**.
48
+
5. Count modeling problems: **Poisson Regression** and **Negative Binomial Regression**.
36
49
37
50
### Fisher Scoring Algorithm
38
51
@@ -81,6 +94,36 @@ The `LogisticRegression` class is a custom implementation of logistic regression
81
94
-`summary()`: Get a summary of model parameters, standard errors, p-values, and confidence intervals.
82
95
-`display_summary()`: Display a summary of model parameters, standard errors, p-values, and confidence intervals.
83
96
97
+
### Robust Logistic Regression
98
+
99
+
The `RobustLogisticRegression` class implements robust logistic regression using the Fisher scoring algorithm with epsilon-contamination for outlier resistance. This method down-weights observations that are unlikely under the main model, providing robustness against data contamination and outliers.
100
+
101
+
**Parameters:**
102
+
-`epsilon_contamination`: Contamination level (0 ≤ ε ≤ 1). Higher values provide more robustness but may reduce efficiency (default: 0.05).
103
+
-`contamination_prob`: Probability for contamination distribution (default: 0.5).
104
+
-`tol`: Convergence tolerance for parameter updates.
105
+
-`max_iter`: Maximum number of iterations for the algorithm.
106
+
-`information`: Type of information matrix to use ('expected' or 'empirical').
107
+
-`use_bias`: Include a bias term in the model.
108
+
-`significance`: Significance level for computing confidence intervals.
109
+
110
+
**Methods:**
111
+
-`fit(X, y)`: Fit the robust model to the data with automatic outlier down-weighting.
112
+
-`predict(X)`: Predict target labels for input data.
113
+
-`predict_proba(X)`: Predict class probabilities for input data.
114
+
-`predict_ci(X)`: Predict class probabilities with confidence intervals.
115
+
-`get_params()`: Get model parameters.
116
+
-`set_params(**params)`: Set model parameters.
117
+
-`summary()`: Get a summary of model parameters, standard errors, p-values, confidence intervals, and robust weights.
118
+
-`display_summary()`: Display a comprehensive summary including robustness metrics (epsilon contamination, average/minimum robust weights).
119
+
120
+
**Key Features:**
121
+
-**Outlier Resistance**: Automatic down-weighting of observations unlikely under the main model.
122
+
-**Robust Weights**: Access to individual observation weights showing outlier identification.
123
+
-**Fisher Scoring Framework**: Consistent with other models using both expected and empirical information matrices.
124
+
-**Statistical Inference**: Complete inference statistics with robust standard errors and confidence intervals.
125
+
-**Rich Output**: Beautiful formatted summaries with robust-specific metrics and diagnostics.
126
+
84
127
### Multinomial Logistic Regression
85
128
86
129
The `MultinomialLogisticRegression` class implements the Fisher Scoring algorithm for multinomial logistic regression, suitable for multi-class classification tasks.
@@ -127,34 +170,58 @@ The `FocalLossRegression` class implements the Fisher Scoring algorithm with foc
127
170
128
171
### Poisson Regression
129
172
130
-
The `PoissonRegression` class implements the Fisher Scoring algorithm for Poisson regression, suitable for modeling count data.
173
+
The `PoissonRegression` class implements the Fisher Scoring algorithm for Poisson regression, suitable for modeling count data and incidence rates. Features robust matrix operations with automatic fallback to pseudo-inverse for numerical stability.
131
174
132
175
**Parameters:**
133
176
-`max_iter`: Maximum number of iterations for optimization.
134
177
-`epsilon`: Convergence tolerance.
135
178
-`use_bias`: Whether to include an intercept term.
179
+
-`offset`: Offset term for rate modeling (e.g., log exposure times).
180
+
-`significance`: Significance level for confidence intervals.
181
+
-`information`: Type of information matrix to use ('expected' or 'empirical').
136
182
137
183
**Methods:**
138
184
-`fit(X, y)`: Fit the model to the data.
139
-
-`predict(X)`: Predict mean values for the Poisson model.
185
+
-`predict(X, offset=None)`: Predict mean values with optional custom offset.
140
186
-`calculate_st_errors(X)`: Calculate standard errors for the coefficients.
187
+
-`summary()`: Get comprehensive model statistics including coefficients, standard errors, p-values, and confidence intervals.
188
+
-`display_summary()`: Display beautiful formatted summary with Rich styling.
189
+
190
+
**Key Features:**
191
+
-**Offset Support**: Full support for rate modeling with log exposure times.
192
+
-**Information Matrix Choice**: Both expected and empirical Fisher information matrices supported.
193
+
-**Robust Implementation**: Safe matrix inversion with automatic pseudo-inverse fallback.
194
+
-**Statistical Summaries**: Complete inference statistics with Wald tests and confidence intervals.
195
+
-**Validated Accuracy**: Mathematical correctness verified against statsmodels with machine precision accuracy.
141
196
142
197
### Negative Binomial Regression
143
198
144
-
The `NegativeBinomialRegression` class implements the Fisher Scoring algorithm for Negative Binomial regression, suitable for overdispersed count data.
199
+
The `NegativeBinomialRegression` class implements the Fisher Scoring algorithm for Negative Binomial regression, suitable for overdispersed count data. Features enhanced robustness with comprehensive statistical inference and fixed critical implementation bugs.
145
200
146
201
**Parameters:**
147
202
-`max_iter`: Maximum number of iterations for optimization.
148
203
-`epsilon`: Convergence tolerance.
149
204
-`use_bias`: Whether to include an intercept term.
150
-
-`alpha`: Fixed dispersion parameter (overdispersion adjustment for Negative Binomial).
0 commit comments