Skip to content

Conversation

@CalCraven
Copy link

@CalCraven CalCraven commented Dec 13, 2025

Description

Option to specify double precision float64 dtype for hoomd schema.

Motivation and Context

It is required to write complete restart files in the HOOMD ecosystem, as floats such as charges have to round precisely to 0, and can deviate at really small float values when converting from float64 to float32 and back again, exacerbated by large enough system sizes.

Resolves glotzerlab/hoomd-blue#2194

How Has This Been Tested?

Test reading and writing both single and double precision hoomd files, checking that the dtype is kept consistent.

Checklist:

  • I have reviewed the Contributor Guidelines.
  • I agree with the terms of the GSD Contributor Agreement.
  • My name is on the list of contributors (doc/credits.rst) in the pull request source branch.
  • I have added a change log entry to CHANGELOG.rst.

@CalCraven
Copy link
Author

Initial notes here:
I implemented less strict regulations in the ParticleData.validate method. Since this modifies the Frame in place, we wouldn't want to round until we call the write_chunk method. This is generally seen in the validate_precision metho, which should probably changed to a class method. If we like this approach, I'll copy it to the other validate methods, such as ConfigurationData, ConstriaintData, etc. Not sure we need every float to take the "double" precision though, unless there are strong thoughts on this.

Quick question. Should we increment the hoomd schema version following this change? It should be completely backward compatible, but potentially some unexpected floating point math could occur if you created a file with this PR, but then read it in a previous version of GSD, then finally write back to disk, any float64 values at this point would be converted to

Copy link
Member

@joaander joaander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have a few comments.

Not sure we need every float to take the "double" precision though, unless there are strong thoughts on this.

If a user sets a global precision="double", they would be surprised if some of the content was written in single precision.

An alternate API would be to remove the global setting and keep the same precision that the user provides for every field, giving them fine-grained control.

Should we increment the hoomd schema version following this change?

Yes, bump the version to 2.0. Update hoomd-schema.rst to note that data types may be float32 or float64. This is a breaking change because a reader following the 1.x specification will specifically expect float32 data and fail to load float64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to leave fl.pyx unmodified. write_chunk already has a mechanism to choose the precision --- the data type of the numpy array. read_chunk callers should expect a numpy array that matches the data type in the file. Those callers can cast it should they choose to do so.

You can achieve your intended "convert on write" with logic in HOOMDTrajectory::append.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave it up to you whether you want to lazily cast or cast on validate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's preferred to lazily cast. An important use case would be to: write out frames in a trajectory in 32 bit precision, then write out a final restart Frame in 64 bit precision. However, we lose the precision if we cast on validate when we write out the trajectory as we go. See test_write_multiple_precision as an example of writing the same Frame with different precisions.

At commit df776ed, we now expectedly cause these tests to fail, since the data objects are modified to represent the correct float_type.

FAILED gsd/test/test_hoomd.py::test_fallback[(r,w)] - AssertionError:
FAILED gsd/test/test_hoomd.py::test_fallback[(a,x)] - AssertionError:
FAILED gsd/test/test_hoomd.py::test_fallback[(r,a)] - AssertionError:
FAILED gsd/test/test_hoomd.py::test_fallback_to_frame0[(r,w)] - AssertionError:
FAILED gsd/test/test_hoomd.py::test_fallback_to_frame0[(a,x)] - AssertionError:
FAILED gsd/test/test_hoomd.py::test_fallback_to_frame0[(r,a)] - AssertionError:

@CalCraven
Copy link
Author

This is a breaking change because a reader following the 1.x specification will specifically expect float32 data and fail to load float64.

Should I reopen the PR to be pulled against trunk-major then?

@CalCraven
Copy link
Author

An alternate API would be to remove the global setting and keep the same precision that the user provides for every field, giving them fine-grained control.

My hesitancy to this would be if a user uses numpy or list of floats, the default precision would be 64, which then would get written into their gsd object as such without manually changing to float32. That's a change in default API and users may not realize that their file sizes would be twice as big, which is why I lean towards specifying double-precision somewhere in the API.

CalCraven and others added 2 commits December 15, 2025 21:49
Co-authored-by: Joshua A. Anderson <joaander@umich.edu>
@joaander
Copy link
Member

joaander commented Dec 16, 2025

This is a breaking change because a reader following the 1.x specification will specifically expect float32 data and fail to load float64.

Should I reopen the PR to be pulled against trunk-major then?

No, the PR is fine. I plan to collapse to a single trunk branch after the next release. Development has slowed to the point that there is no longer a need for separate trunk branches.

My hesitancy to this would be if a user uses numpy or list of floats, the default precision would be 64, which then would get written into their gsd object as such without manually changing to float32. That's a change in default API and users may not realize that their file sizes would be twice as big, which is why I lean towards specifying double-precision somewhere in the API.

I understand, but this will be a breaking release. Ignore backwards compatibility for the moment. Ask yourself what a new user (one with no prior knowledge and has not read the documentation) would expect. Would they assume that append(frame) preserves the precision of the values they placed in frame, or would they assume that append(frame) would round values down to f32 precision?

I don't have a strong preference either way on global vs. granular. I'm fine with a global flag. But if we choose to offer granular control, than I would prefer that the user-provided precision is maintained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants