Discrete null fitting algorithm (updates to PR #175) #179

svteichman · 2025-12-19T18:07:48Z

This PR updates the discrete null fitting algorithm implemented in PR #175, integrates it into the code base, and tests it against fit_null_symmetric(). By default, this discrete null fitting algorithm is used automatically for discrete designs when $J < 150$, and otherwise the symmetric algorithm is used (the argument null_fit_alg can be used to override these defaults). This $J = 150$ is heuristic, determined by comparing runtimes and log likelihoods between these approaches across a range of $n$, $J$, and $p$ in two real datasets. While the discrete approach typically achieves a higher log likelihood (often not by very much, but occasionally by a lot), it becomes slow than the symmetric approach between $J = 100$ and $J = 200$, and quite a bit slower for $J > 500$.

This is a subset of runtime results from tests that are skipped or commented out in "test-null_fit_discrete":

# the next set of tests compare the timing of fit_null_discrete to fit_null_symmetric
# for a variety of n, J, and p, using the soil dataset included in `corncob` and the
# wirbel dataset included in `radEmu`. Each example runs either 10, 20, or 30 robust 
# score tests and compares across the two methods.
# Different sized datasets are generated by filtering samples, considering taxa at either the 
# species or genus level, and in some cases subsetting to one phylum or another 

# tldr:

# wirbel
# n = 126, J = 128, p = 2, sandwich 42 seconds, discrete 16 seconds
# n = 566, J = 133, p = 5, sandwich 306 seconds, discrete 334 seconds
# n = 126, J = 430, p = 2, sandwich 143 seconds, discrete 421 seconds
# n = 126, J = 758, p = 2, sandwich 8 minutes, discrete 60 minutes

# soil
# n = 119, J = 109, p = 3, sandwich 35 seconds, discrete 12 seconds
# n = 119, J = 147, p = 3, sandwich 50 seconds, discrete 49 seconds
# n = 64, J = 234, p = 2, sandwich 121 seconds, discrete 165 seconds
# n = 64, J = 242, p = 2, sandwich 101 seconds, discrete 95 seconds
# n = 64, J = 479, p = 2, sandwich 140 seconds, discrete 534 seconds

One additional note - I also experimented with increasing the discrete root mean score norm tolerance, but this did not decrease runtime very much, especially for large $J$.

…al_h0

…on, removing B_tol argument from fit_null_symmetric functions

adw96 and others added 11 commits November 19, 2025 16:48

gradient descent + line search, sigh

05c9371

a few attempts

efcf5dd

add rough preliminary test

1cd6157

wrapping my head around it; plan to pull from main

b229694

Merge remote-tracking branch 'origin/main' into closed_form_categoric…

e3dcbac

…al_h0

generalise to from two to multiple groups

5fb1561

generalise j_ref and j_constr

82b3832

discrete fitting ready for ST to review

7c08edc

removing discrete functions that are not used in updated implementati…

765ef8c

…on, removing B_tol argument from fit_null_symmetric functions

update fit_null_discrete and test it against fit_null_symmetric

4d826b4

small update to test for discrete null fit

2080fc8

svteichman mentioned this pull request Dec 19, 2025

Speed up null fitting for categorical X's via Fisher scoring + line search #175

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discrete null fitting algorithm (updates to PR #175) #179

Discrete null fitting algorithm (updates to PR #175) #179

Uh oh!

svteichman commented Dec 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Discrete null fitting algorithm (updates to PR #175) #179

Are you sure you want to change the base?

Discrete null fitting algorithm (updates to PR #175) #179

Uh oh!

Conversation

svteichman commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

svteichman commented Dec 19, 2025 •

edited

Loading