Skip to content

Create gempyor.distributions module#575

Merged
emprzy merged 30 commits intodevfrom
gempyor-distributions
Jul 17, 2025
Merged

Create gempyor.distributions module#575
emprzy merged 30 commits intodevfrom
gempyor-distributions

Conversation

@emprzy
Copy link
Collaborator

@emprzy emprzy commented Jun 20, 2025

Describe your changes.

This pull request creates a distributions module that contains classes with sampling methods for a variety of distributions, and corresponding tests/validation for these classes in a new directory, gempyor_pkg/tests/distributions/.

Does this pull request make any user interface changes? If so please describe.

No user interface changes.

What does your pull request address? Tag relevant issues.

This pull request was spun out of #570, and addresses what is requested in #573 (@MacdonaldJoshuaCaleb , see Gamma, Weibull, and Beta distributions added to distributions.py).

@emprzy emprzy self-assigned this Jun 23, 2025
@emprzy emprzy marked this pull request as ready for review June 23, 2025 15:53
Copy link
Contributor

@TimothyWillard TimothyWillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename the test files from test_ObjectName.py to test_object_name_class.py to match other test files. For future reference I think we only need to unit test custom field/model validators, we can trust that the folks at pydantic have throughly unit tested validations provided by their package.

@emprzy
Copy link
Collaborator Author

emprzy commented Jun 24, 2025

@TimothyWillard , just committed with your suggestions. Let me know if you want each of the .sample() methods to have full docstrings, or if you want an examples >>> section in the ABC.sample() method.

Copy link
Contributor

@TimothyWillard TimothyWillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. An example for DistributionABC.sample doesn't make much sense because it's an ABC so we can pass on that. Not exactly sure how the documentation renders here, but I think DistributionABC.sample's docstring applies to all subclasses here so we can also pass on that.

@TimothyWillard TimothyWillard linked an issue Jun 30, 2025 that may be closed by this pull request
@TimothyWillard TimothyWillard added enhancement Request for improvement or addition of new feature(s). gempyor Concerns the Python core. medium priority Medium priority. next release Marks a PR as a target to include in the next release. labels Jun 30, 2025
@emprzy emprzy force-pushed the gempyor-distributions branch from 21b28c4 to 74713e2 Compare June 30, 2025 19:34
TimothyWillard and others added 15 commits June 30, 2025 17:44
Added a module for representing distributions that can be used
throughout gempyor. Started with `DistributionABC` as and abstract base
for all distributions. Implemented `FixedDistribution` and
`NormalDistribution` using that ABC. Finally, exposed a field
discriminated type to easily create distributions from config.
Added a representation for uniform distributions to
`gempyor.distributions`.
Added a representation of a poisson distribution to
`gempyor.distributions`
Added a representation for binomial distribution to
`gempyor.distributions`.
Added a lognormal distribution to `gempyor.distributions`.
Added a truncated normal distribution to `gempyor.distributions`.
Added a representation for gamma distributions to `gempyor.distributions`
Added a representation for Weibull distributions to `gempyor.distributions`. Also, fixed a return type hint error.
Unit tests and validation for the `NormalDistribution` class in `gempyor_pkg/tests/distributions/`
Unit tests and validation for the `FixedDistribution` class in `gmepyor_pkg/tests/distributions/`. Also removal of an unused parameter in the `FixedDistribution` `.sample()` method.
Unit tests and validation for the `UniformDistribution` class in `gempyor_pkg/tests/distributions/`. Also linting from a previous commit I forgot to add.
Unit tests and validation for the `LognormalDistribution` class in `gempyor_pkg/tests/distributions/` + a fix for an issue that I created in `2c5fce5`
Add tests for the `TruncatedNormalDistribution` class from `gempyor.distributions`. Also, formatting fixes and bound fixes.
Unit tests and validation for `PoissonDIstribution` class in `gempyor.distributions`. Also, formatting and bounds fixes.
emprzy added 2 commits July 11, 2025 11:47
Take 10 samples rather than 1 to reduce the risk of an identical series of samples being drawn (causing a test failure when arrays are compared).
Copy link
Contributor

@TimothyWillard TimothyWillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment about not wanting the DistributionABC.rng property to be public, but otherwise looks good to me.

use pydantic capabilities for default_factory

Co-authored-by: Timothy Willard <9395586+TimothyWillard@users.noreply.github.com>
Copy link
Contributor

@pearsonca pearsonca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking basically right. A few definite TODOs, but two items for discussion?

@emprzy @TimothyWillard what are your thoughts re 1) testing the initializer and 2) testing the output ranges?

depending how those opinions go, there's a bit of stripping out to do alongside the other fixes.

@TimothyWillard
Copy link
Contributor

TimothyWillard commented Jul 14, 2025

Looking basically right. A few definite TODOs, but two items for discussion?

@emprzy @TimothyWillard what are your thoughts re 1) testing the initializer and 2) testing the output ranges?

depending how those opinions go, there's a bit of stripping out to do alongside the other fixes.

Re:

  1. I don't think we should be in the business of testing vendor libraries, but the tests are already written so the change would be remove them which I also don't see a lot of value in. I don't have a strong opinion on what to do here, but agree we do not want to test pydantic/numpy/etc. going forward.
  2. Yeah, that's fine to skip testing ranges with me, relates to (1). Might want to keep some for situations where we do something unconventional like the truncated normal (i.e. we modify the inputs to match our preferred input spec on ranges).

@emprzy
Copy link
Collaborator Author

emprzy commented Jul 14, 2025

Looking basically right. A few definite TODOs, but two items for discussion?
@emprzy @TimothyWillard what are your thoughts re 1) testing the initializer and 2) testing the output ranges?
depending how those opinions go, there's a bit of stripping out to do alongside the other fixes.

Re:

  1. I don't think we should be in the business of testing vendor libraries, but the tests are already written so the change would be remove them which I also don't see a lot of value in. I don't have a strong opinion on what to do here, but agree we do not want to test pydantic/numpy/etc. going forward.
  2. Yeah, that's fine to skip testing ranges with me, relates to (1). Might want to keep some for situations where we do something unconventional like the truncated normal (i.e. we modify the inputs to match our preferred input spec on ranges).

Agree about item 1).

Item 2), It would be pretty easy to add tests for ranges, if we want to do that. I see no harm in implementing them if we think it would add value.

emprzy added 3 commits July 14, 2025 17:46
Move shape tests to something that is tested at the `DistributionsABC` level, small verbiage changes in parametrization, re-naming of `DistributionsABC` testing file.
@emprzy emprzy requested a review from TimothyWillard July 16, 2025 19:24
Copy link
Contributor

@TimothyWillard TimothyWillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Comment on lines 351 to 352
n: int = Field(..., gt=0)
p: float = Field(..., gt=0.0, lt=1.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm, shouldn't these be ge, le still? and the == part is now being checked by the edge cases bit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you not want n == 0 and (p == 0 or p == 1) to be considered an edge case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - but isn't gt / lt going to strictly disallow them, irrespective of allow_edge_cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be exercising the allowed vs disallowed edge cases, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh hm great point. I'll fix that.

Change to correct operator (`ge` and `le`), update tests accordingly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Request for improvement or addition of new feature(s). gempyor Concerns the Python core. medium priority Medium priority. next release Marks a PR as a target to include in the next release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Convert as_random_distribution To Use scipy.stats Distributions

3 participants