Skip to content

Multinomial PMF likely converts back from ln to exp too early #370

@HadrienG2

Description

@HadrienG2

The current implementation of Multinomial::pmf() computes the multinomial coefficient via factorial::multinomial(), which itself computes the logarithm of the multinomial coefficient then gets back to the coefficient via an exponential. The result is then multiplied by the product of probabilities elevated at the appropriate power.

The problem with this implementation is that because the multinomial coefficient is n! divided by factorials with much smaller arguments, the coefficient grows quickly enough with n to overflow even an f64. When that happens factorial::multinomial will overflow to inf, the product of small probabilities will underflow to 0.0, and therefore the product of those two will be NaN.

This outcome can be avoided by computing the logarithm of the multinomial coefficient, subtracting from it the sum of the logarithms of the probabilities multiplied by their occurence count (which I suspect is also likely to be significantly more precise than elevating probabilities to a huge power, at the risk of underflowing to zero), and only then using the exponential to get to the final result.

But I'm not confident enough in my applied math skills to be 100% sure that this way of computing is numerically stable enough. I only know enough to know that computing the exponential of a sum with terms of opposite signs in it should be approached with a lot of care. So if someone around here feels more confident in their floating-point game, a theoretical cross-check of this approach is more than welcome before I proceed with submitting a PR that changes to that implementation :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions