Speed up null fitting for categorical X's via Fisher scoring + line search #175

adw96 · 2025-11-20T20:18:16Z

The goal here is not to have a general faster fitting but some major speed ups for the common case of categorical covariates and pseudoHuber median g(.).

It turns out that #90 is impossible (no closed form for pH median g(.)), but we can speed things up.

I'm seeing perhaps 1000x faster, basically instantaneous fitting, but I haven't broadly tested it over a range of p and J. (n doesn't matter here)

In the one case I investigated where estimates differ from fit_null, this approach had higher likelihood and it strictly enforces the g constraints (ie constraint_tol is trivially 0).

What's it actually doing? Fisher scoring (in the free parameters) + a line search to ensure the likelihood is increasing at each step.

Lots of ChatGPT here, which is fine because I know what I want.

Not ready for review, but opening to give some updates to @svteichman

Next steps (AW) - taken from my notes in fit_null_discrete

  ## Currently this works for 
  #### two groups with X's (1, 0) and (1, 1)
  #### testing the first column, last column also constrained
  
  #### it is equivalent to fit_null with k_constr=2, j_constr=1, j_ref=J,
  
  ## It needs to be generalized to consider 
  #### multiple categories (can we assume identity-like then back transform? is this constraint and likelihood-preserving?) 
  #### different columns for testing
  #### the same inputs as fit_null and comparable convergence statistics

…al_h0

adw96 · 2025-12-08T00:59:40Z

preliminary generalisation to ncol(X) == nrow(distinct_xx) case for general ncol(X)
preliminary confirmation that it's doing the right thing
more extensive testing that it's doing the right thing
more extensive testing that it's likelihood is no worse (or not much worse)
align IO with current
generalise to generic j_ref
testing for correctness
documentation of inputs
integrate into codebase

adw96 · 2025-12-08T10:32:24Z

Hi @svteichman -- I think this is ready for your feedback and edits. A "wish list", not all essential

integration into codebase
documentation, inc w holes for me to fill
matching formats of output of original null fitting
other things I'm too tired to think of now...

Please let me know your questions -- and thank you!!!!

svteichman · 2025-12-08T17:47:22Z

Very exciting! I'll take a look and get started on integrating this into the codebase.

svteichman · 2025-12-19T22:27:18Z

This PR is superseded by PR #179, and can be closed once that is reviewed and merged (although we may want to remember someday that this PR is the place where Amy derived the second derivative of the pseudohuber function!!)

adw96 added 3 commits November 19, 2025 16:48

gradient descent + line search, sigh

05c9371

a few attempts

efcf5dd

add rough preliminary test

1cd6157

svteichman mentioned this pull request Nov 21, 2025

Feature: Can we trivially speed up things for categorical X by aggregating Y's within categories? #174

Open

adw96 added 3 commits December 8, 2025 08:26

wrapping my head around it; plan to pull from main

b229694

Merge remote-tracking branch 'origin/main' into closed_form_categoric…

e3dcbac

…al_h0

generalise to from two to multiple groups

5fb1561

adw96 added 2 commits December 8, 2025 20:26

generalise j_ref and j_constr

82b3832

discrete fitting ready for ST to review

7c08edc

svteichman mentioned this pull request Dec 19, 2025

Discrete null fitting algorithm (updates to PR #175) #179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up null fitting for categorical X's via Fisher scoring + line search #175

Speed up null fitting for categorical X's via Fisher scoring + line search #175

Uh oh!

adw96 commented Nov 20, 2025 •

edited

Loading

Uh oh!

adw96 commented Dec 8, 2025 •

edited by svteichman

Loading

Uh oh!

adw96 commented Dec 8, 2025 •

edited by svteichman

Loading

Uh oh!

svteichman commented Dec 8, 2025

Uh oh!

svteichman commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Speed up null fitting for categorical X's via Fisher scoring + line search #175

Are you sure you want to change the base?

Speed up null fitting for categorical X's via Fisher scoring + line search #175

Uh oh!

Conversation

adw96 commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adw96 commented Dec 8, 2025 • edited by svteichman Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adw96 commented Dec 8, 2025 • edited by svteichman Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

svteichman commented Dec 8, 2025

Uh oh!

svteichman commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adw96 commented Nov 20, 2025 •

edited

Loading

adw96 commented Dec 8, 2025 •

edited by svteichman

Loading

adw96 commented Dec 8, 2025 •

edited by svteichman

Loading