Saving and loading for `Fit` objects by Jfeatherstone · Pull Request #126 · powerlaw-devs/powerlaw

Jfeatherstone · 2026-02-26T04:54:34Z

This PR implements robust saving and loading features as brought up in #124, and implements (optional) parallelization for the xmin calculation.

Implements the __hash__ and __eq__ functions for Fit objects, allowing equality to be tested between different instances. This equality is defined in a reasonable way based on the values of the data and fitting parameters, with emphasis on being able to compare if a Fit object contains exactly the same information regardless of whether the instance itself is different.
Implements the save and load functions for Fit objects, which allow for caching results and loading them in easily. These functions currently support two formats: Python pickle files and hdf5. Both function pretty much exactly the same from all of my testing. Pickling is of course a standard practice for Python, and hdf5 is implemented in the case that users want a more universal file type (for example that can be loaded in another language). The format keyword to these functions (also automatically inferred from file extension) controls which is used; hdf5 is the default since it gives smaller files.
Tests have been added in testing/test_saving_loading.py to confirm that this hashing works properly, and that the saving and loading works as intended.
In order to make sure a Fit object is able to be cached, there are some (minor) restrictions on the form of constraint functions. I've added a discussion of this in the documentation, including best practices and how to avoid issues.
The pickle format requires using dill instead of the standard library pickle. Of course it is better to have fewer dependencies, but pickling arbitrary (local) functions isn't possible without this. I've been debating introducing this library for a while now since it is needed for parallelizing the xmin calculation (for exactly the same reason it is needed for saving and loading). With the h5py library needed for that format, this update adds two new dependencies.
xmin calculation parallelization was already templated in v2.0, and with the addition of dill, it now works. Note that you'll only really see a speed-up from using more cores if you're fitting a costly function (eg. truncated power law). Power laws will take pretty much the same amount of time (if not more) with more processes since the MLE expression is so quick, most of the time is spent communicating and storing values anyway.

…ng and loading

… do some more bug checking

keflavich

There's a lot to review here, and my comments are mostly nitpicky, but:

It would be helpful to separate different concepts as much as possible into their own PRs. So, documentation updating should be in a different PR than saving. I think saving and parallelizing are intricately linked, though, so they have to go together?
I can't open the .ipynb file. If it's meant to include rendered graphics, I recommend we make a separate repository for notebooks-with-rendered-graphics (in the powerlaw-dev org) to avoid bloating the code base

keflavich · 2026-02-26T19:16:26Z

docs/source/tutorials/saving_fits.rst

+    ...
+
+The saving and loading functions currently support two different file formats:
+pickle and hdf5. A pickle file is Python's way of serializing an object,


link to numpy/h5py documentation here?

keflavich · 2026-02-26T19:17:33Z

docs/source/index.rst

 -----------
 - Original paper for the library: http://arxiv.org/abs/1305.0215
- Source code: https://github.com/jeffalstott/powerlaw
+- Source code: https://github.com/powerlaw-devs/powerlaw


these changes are great & important - but maybe they should go in a separate PR?

keflavich · 2026-02-26T19:18:26Z

powerlaw/fitting.py


+# Try and grab the current version of the package from the _version.py file.
+# This is for comparing with saved/loaded Fit objects. If we can't find that
+# file, we have to work without it.


shouldn't this just raise an exception about being a bad installation?

The reason I didn't raise an error here is because the _version.py file is only generated if you properly install the library (run pip install . in the cloned repository or equivalent), so it won't exist right after you clone the repository. I wanted it to be possible to test new features, branches, etc. without fully installing the package, for which the only solution I could think of was to manually read the _version.py file, but being able to continue when this file doesn't exist.

Please let me know if there is a better way to achieve this, or if you think requiring installation is fine.

I'd recommend requiring installation.

If you want to have a test version running, pip install -e . means you don't have to reinstall every time you want to run a test

Hmm, how about a warning if the package isn't installed, but not an error? I think we don't lose any functionality by not installing other than not being able to compare versions of cached files, so I'd rather not put a full error there.

Maybe this is bad practice, but I find myself using local packages that aren't installed through pip sometimes, especially if I want to compare different versions of the same package (where the stable version is installed through pip, and the experimental one is local and just directly added to the Python path).

I think that is bad practice and can have some really awkward consequences. I'm mostly parroting what others have instructed me to do in large collaborations, but pip install -e . works perfectly whenever I have a package with appropriate installation instructions like this one.

That said, I'm not going to be overly rigid about this. I don't think it's going to break anyone's code to have this extra approach.

Good to know :) I will look into alternative ways to do this for my own personal development, but yeah hopefully the warning should be enough to deter people from using this approach unless they really have no other option.

Jfeatherstone · 2026-03-02T02:52:59Z

It would be helpful to separate different concepts as much as possible into their own PRs. So, documentation updating should be in a different PR than saving. I think saving and parallelizing are intricately linked, though, so they have to go together?

I put the documentation and code changes together since I thought it might make things easier to review, though I can separate them in the future. And indeed, parallelizing should be included in this, even if it doesn't seem related at first.

I can't open the .ipynb file. If it's meant to include rendered graphics, I recommend we make a separate repository for notebooks-with-rendered-graphics (in the powerlaw-dev org) to avoid bloating the code base

Yeah, now that I think about it, having the documentation figures be generated in a Jupyter notebook is not a good idea at all. Probably better to just have a regular Python script. I plan to submit a PR about the documentation soon (including a Github Action to generate and host it on Github Pages), so I will address this issue then.

…changes in readme

…t match standard python ones

Jfeatherstone · 2026-03-04T01:08:35Z

I think the tolerances on the generation-fitting unit tests are a little too low, as one of the tests failed but then passed on a re-run. This doesn't relate to this PR so I'll address it in another one, but all tests are passing now.

keflavich

one more minor comment from me. I'd still recommend splitting the .ipynb & relinking stuff into a different PR prior to merging this.

docs/source/tutorials/saving_fits.rst

Cleaned up header formatting Co-authored-by: Adam Ginsburg <keflavich@gmail.com>

…ther PR

Jfeatherstone · 2026-03-04T01:43:01Z

Ok I've reset all the changes about the repository links and the Jupyter notebook now. The notebook already exists in the master branch (was included in v2.0) so the file still exists here, but this PR doesn't change anything about it. As above, I'll fix it in a PR about the documentation.

Jfeatherstone · 2026-03-04T01:48:20Z

Thanks for reviewing this, and sorry that it was a little messy :)

Jack Featherstone added 7 commits February 20, 2026 11:01

Added proper hashing to fit objects, implemented most of h5 file savi…

8321a7b

…ng and loading

Added testing for hashing

9d70cf2

Finished implemented h5 saving and loading, added some tests, need to…

0a0d288

… do some more bug checking

Added automatic caching and updated documentation and readme

b5d7e87

Removed documentation images, better to just regenerate them

73deef3

Added parallelization for xmin fitting

8da1726

Updated pyproject.toml and setup.py with new dependencies

b1ec769

keflavich reviewed Feb 26, 2026

View reviewed changes

Jack Featherstone added 2 commits March 2, 2026 14:10

Added links to docs, warning if package is not installed, undid link …

5cf30e3

…changes in readme

Fixed bug where hashing would use numpy bool variables which would no…

d08c8cf

…t match standard python ones

keflavich reviewed Mar 4, 2026

View reviewed changes

docs/source/tutorials/saving_fits.rst Outdated Show resolved Hide resolved

Jfeatherstone and others added 2 commits March 4, 2026 10:34

Update docs/source/tutorials/saving_fits.rst

f57bbdb

Cleaned up header formatting Co-authored-by: Adam Ginsburg <keflavich@gmail.com>

Reset jupyter notebook file to master version, will be removed in ano…

33d4607

…ther PR

Changed unknown version value from None to string

b9889ce

Jfeatherstone mentioned this pull request Mar 11, 2026

fix cumulative_distribution_function #110

Open

Conversation

Jfeatherstone commented Feb 26, 2026

Uh oh!

keflavich left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jfeatherstone commented Mar 2, 2026

Uh oh!

Jfeatherstone commented Mar 4, 2026

Uh oh!

keflavich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jfeatherstone commented Mar 4, 2026

Uh oh!

Jfeatherstone commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants