Skip to content

Commit 3400498

Browse files
committed
Prepare for v2.0.4 release
1 parent 13e230b commit 3400498

23 files changed

Lines changed: 2296 additions & 1285 deletions

.DS_Store

0 Bytes
Binary file not shown.

.gitignore

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,10 @@ __pycache__/
66
**/__pycache__/
77
**/__pycache__/*.pyc
88
*.pyc
9-
**/*.pyc
9+
**/*.pyc
10+
.trunk
11+
.venv/
12+
13+
# Hide egg-info
14+
*.egg-info/
15+
dist/

LICENSE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) [2024] [Denis Burakov]
3+
Copyright (c) [2025] [Denis Burakov]
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

Lines changed: 75 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
# Fisher Scoring with Python
22

33
**Author:** [xRiskLab](https://github.com/xRiskLab)<br>
4-
**Version:** v2.0.3<br>
5-
**License:** [MIT License](https://opensource.org/licenses/MIT) (2024)
4+
**Version:** v2.0.4<br>
5+
**License:** [MIT License](https://opensource.org/licenses/MIT) (2025)
66

77
![Title](https://github.com/xRiskLab/fisher-scoring/raw/main/docs/images/title.png)
88

99
This repository contains optimized Python implementations of the Fisher Scoring algorithm for various logistic regression models. With version 2.0, the core algorithms are now significantly faster due to optimized matrix operations and reduced memory usage, providing faster convergence for larger datasets.
1010

1111
```python
1212
%pip install fisher-scoring
13-
from fisher_scoring import FisherScoringLogisticRegression
13+
from fisher_scoring import LogisticRegression
1414

1515
# Initialize and fit model
16-
model = FisherScoringLogisticRegression()
16+
model = LogisticRegression()
1717
model.fit(X_train, y_train)
1818

1919
# Make predictions
@@ -25,11 +25,14 @@ probabilities = model.predict_proba(X_test)
2525

2626
### Introduction
2727

28-
This repository contains a Python package with scikit-learn compatible implementations of the Fisher Scoring algorithm for various logistic regression use cases:
28+
This repository contains a Python package with scikit-learn compatible implementations of the Fisher Scoring algorithm for various modeling problems.
29+
30+
The packages provides implementations of logistic regression (MLE for binary, multiclass, and binary imbalanced) for proportions (risk or prevalence) and Poisson and Negative Binomial regression for log-linear regression for incidence rates.
2931

3032
1. Binary classification problems: **Logistic Regression**.
3133
2. Multi-class classification problems: **Multinomial Logistic Regression**.
3234
3. Imbalanced classification problems: **Focal Loss Logistic Regression**.
35+
4. Count modeling problems: **Poisson Regression** and **Negative Binomial Regression**.
3336

3437
### Fisher Scoring Algorithm
3538

@@ -38,31 +41,33 @@ The Fisher Scoring algorithm is an iterative optimization technique that estimat
3841
There are two types of information matrices used in the Fisher Scoring algorithm:
3942

4043
* **Expected Information Matrix**: Relies on predicted probabilities, providing an efficient approximation for the information matrix.
41-
* **Observed Information Matrix**: Uses ground truth labels to calculate the information matrix, often resulting in more reliable inference metrics.
44+
* **Empirical Information Matrix**: Uses ground truth labels to calculate the information matrix, often resulting in more reliable inference metrics.
4245

4346
These information matrices are used to derive standard errors of estimates to calculate detailed model statistics, including Wald statistics, p-values, and confidence intervals at a chosen level.
4447

48+
Source: [Limitations of the Empirical Fisher Approximation for Natural Gradient Descent](https://arxiv.org/pdf/1905.12558).
49+
4550
### Implementation Notes
4651

47-
- **Fisher Scoring Multinomial Regression**
48-
The `FisherScoringMultinomialRegression` model differs from standard statistical multinomial logistic regression by using all classes rather than $K - 1$. This approach allows multi-class classification problems to be converted to binary problems by calculating $1 - P_{Class=1}$.
52+
- **Multinomial Logistic Regression**
53+
The `MultinomialLogisticRegression` model differs from standard statistical multinomial logistic regression by using all classes rather than $K - 1$. This approach allows multi-class classification problems to be converted to binary problems by calculating $1 - P_{Class=1}$.
4954

50-
- **Fisher Scoring Focal Regression**
51-
The `FisherScoringFocalRegression` class employs a non-standard focal log-likelihood function in its optimization process leveraging $\gamma$ to focus on difficult-to-classify examples.
55+
- **Focal Loss Regression**
56+
The `FocalLossRegression` class employs a non-standard focal log-likelihood function in its optimization process leveraging $\gamma$ to focus on difficult-to-classify examples.
5257
The focal loss function, originally developed for object detection, prioritizes difficult-to-classify examples—often the minority class—by reducing the contribution of easy-to-classify samples. It introduces a focusing parameter, *gamma*, which down-weights the influence of easily classified instances, thereby concentrating learning on challenging cases.
5358

5459
Source: [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002).
5560

5661
## Models
5762

58-
### Fisher Scoring Logistic Regression
63+
### Logistic Regression
5964

60-
The `FisherScoringLogisticRegression` class is a custom implementation of logistic regression using the Fisher scoring algorithm. It provides methods for fitting the model, making predictions, and computing model statistics, including standard errors, Wald statistics, p-values, and confidence intervals.
65+
The `LogisticRegression` class is a custom implementation of logistic regression using the Fisher scoring algorithm. It provides methods for fitting the model, making predictions, and computing model statistics, including standard errors, Wald statistics, p-values, and confidence intervals.
6166

6267
**Parameters:**
6368
- `epsilon`: Convergence threshold for the algorithm.
6469
- `max_iter`: Maximum number of iterations for the algorithm.
65-
- `information`: Type of information matrix to use ('expected' or 'observed').
70+
- `information`: Type of information matrix to use ('expected' or 'empirical').
6671
- `use_bias`: Include a bias term in the model.
6772
- `significance`: Significance level for computing confidence intervals.
6873

@@ -76,14 +81,14 @@ The `FisherScoringLogisticRegression` class is a custom implementation of logist
7681
- `summary()`: Get a summary of model parameters, standard errors, p-values, and confidence intervals.
7782
- `display_summary()`: Display a summary of model parameters, standard errors, p-values, and confidence intervals.
7883

79-
### Fisher Scoring Multinomial Regression
84+
### Multinomial Logistic Regression
8085

81-
The `FisherScoringMultinomialRegression` class implements the Fisher Scoring algorithm for multinomial logistic regression, suitable for multi-class classification tasks.
86+
The `MultinomialLogisticRegression` class implements the Fisher Scoring algorithm for multinomial logistic regression, suitable for multi-class classification tasks.
8287

8388
**Parameters:**
8489
- `epsilon`: Convergence threshold for the algorithm.
8590
- `max_iter`: Maximum number of iterations for the algorithm.
86-
- `information`: Type of information matrix to use ('expected' or 'observed').
91+
- `information`: Type of information matrix to use ('expected' or 'empirical').
8792
- `use_bias`: Include a bias term in the model.
8893
- `significance`: Significance level for computing confidence intervals.
8994
- `verbose`: Enable verbose output.
@@ -98,15 +103,15 @@ The `FisherScoringMultinomialRegression` class implements the Fisher Scoring alg
98103

99104
The algorithm is in a beta version and may require further testing and optimization to speed up matrix operations.
100105

101-
### Fisher Scoring Focal Loss Regression
106+
### Focal Loss Regression
102107

103-
The `FisherScoringFocalRegression` class implements the Fisher Scoring algorithm with focal loss, designed for imbalanced classification problems where the positive class is rare.
108+
The `FocalLossRegression` class implements the Fisher Scoring algorithm with focal loss, designed for imbalanced classification problems where the positive class is rare.
104109

105110
**Parameters:**
106111
- `gamma`: Focusing parameter for focal loss.
107112
- `epsilon`: Convergence threshold for the algorithm.
108113
- `max_iter`: Maximum number of iterations for the algorithm.
109-
- `information`: Type of information matrix to use ('expected' or 'observed').
114+
- `information`: Type of information matrix to use ('expected' or 'empirical').
110115
- `use_bias`: Include a bias term in the model.
111116
- `verbose`: Enable verbose output.
112117

@@ -120,33 +125,71 @@ The `FisherScoringFocalRegression` class implements the Fisher Scoring algorithm
120125
- `summary()`: Get a summary of model parameters, standard errors, p-values, and confidence intervals.
121126
- `display_summary()`: Display a summary of model parameters, standard errors, p-values, and confidence intervals.
122127

123-
## Installation
128+
### Poisson Regression
124129

125-
To use the models, clone the repository and install the required dependencies.
130+
The `PoissonRegression` class implements the Fisher Scoring algorithm for Poisson regression, suitable for modeling count data.
126131

127-
```bash
128-
git clone https://github.com/xRiskLab/fisher-scoring.git
129-
cd fisher-scoring
130-
pip install -r requirements.txt
131-
```
132+
**Parameters:**
133+
- `max_iter`: Maximum number of iterations for optimization.
134+
- `epsilon`: Convergence tolerance.
135+
- `use_bias`: Whether to include an intercept term.
136+
137+
**Methods:**
138+
- `fit(X, y)`: Fit the model to the data.
139+
- `predict(X)`: Predict mean values for the Poisson model.
140+
- `calculate_st_errors(X)`: Calculate standard errors for the coefficients.
132141

133-
Alternatively, install the package directly from PyPI.
142+
### Negative Binomial Regression
134143

135-
```bash
136-
pip install fisher-scoring
137-
```
144+
The `NegativeBinomialRegression` class implements the Fisher Scoring algorithm for Negative Binomial regression, suitable for overdispersed count data.
145+
146+
**Parameters:**
147+
- `max_iter`: Maximum number of iterations for optimization.
148+
- `epsilon`: Convergence tolerance.
149+
- `use_bias`: Whether to include an intercept term.
150+
- `alpha`: Fixed dispersion parameter (overdispersion adjustment for Negative Binomial).
151+
- `phi`: Constant scale parameter.
152+
- `offset`: Offset term for the linear predictor.
153+
154+
**Methods:**
155+
- `fit(X, y)`: Fit the model to the data.
156+
- `predict(X)`: Predict mean values for the Negative Binomial model.
157+
- `calculate_st_errors(X)`: Calculate standard errors for the coefficients.
158+
159+
## Utilities
160+
161+
### Visualization
162+
163+
The package includes a utility function for visualizing observed vs predicted probabilities for count data, which can be useful for users working with Poisson and Negative Binomial models.
164+
165+
**Function:**
166+
- `plot_observed_vs_predicted(y, mu, max_count=15, alpha=None, title="Observed vs Predicted Probabilities", model_name="Model", ax=None, plot_params=None)`: Plot observed vs predicted probabilities for count data.
167+
168+
**Parameters:**
169+
- `y`: Observed count data.
170+
- `mu`: Predicted mean values from the model.
171+
- `max_count`: Maximum count to consider for probabilities.
172+
- `alpha`: Overdispersion parameter for Negative Binomial. If None, assumes Poisson (alpha=0).
173+
- `title`: Title for the plot.
174+
- `model_name`: Name of the model for labeling.
175+
- `ax`: Matplotlib axis to plot on.
138176

139177
## Change Log
140178

179+
- **v2.0.4**
180+
- Added a beta version of Poisson and Negative Binomial regression using Fisher Scoring.
181+
- Changed naming conventions for simplicity and consistency.
182+
- Changed poetry to uv for packaging.
183+
141184
- **v2.0.3**
142185
- Added a new functionality of inference of mean responses with confidence intervals for all algorithms.
143186
- Focal logistic regression now supports all model statistics, including standard errors, Wald statistics, p-values, and confidence intervals.
144187

145188
- **v2.0.2**
146-
- **Bug Fixes**: Fixed the `FisherScoringMultinomialRegression` class to have flexible NumPy data types.
189+
- **Bug Fixes**: Fixed the `MultinomialLogisticRegression` class to have flexible NumPy data types.
147190

148191
- **v2.0.1**
149-
- **Bug Fixes**: Removed the debug print statement from the `FisherScoringLogisticRegression` class.
192+
- **Bug Fixes**: Removed the debug print statement from the `LogisticRegression` class.
150193

151194
- **v2.0**
152195
- **Performance Improvements**: Performance Enhancements: Optimized matrix calculations for substantial speed and memory efficiency improvements across all models. Leveraging streamlined operations, this version achieves up to 290x faster convergence. Performance gains per model:
-17.7 KB
Binary file not shown.

dist/fisher_scoring-2.0.3.tar.gz

-13.4 KB
Binary file not shown.

docs/.DS_Store

0 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)