-
Notifications
You must be signed in to change notification settings - Fork 9
Save nexus v1 #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Save nexus v1 #108
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
93ab7ba
add save_nexus and load_nexus
bmaranville ea2602d
add load_nexus and save_nexus to exports
bmaranville dc72f4f
add tests of reading and writing OrsoDatasets to NeXus
bmaranville 55d80bb
add h5py to dev requirements
bmaranville ec6d716
add ORSO_VERSION to attributes of NeXus datasets
bmaranville 77f57b5
add physical_quantity to column attrs (just for convenience, it's alr…
bmaranville e202c89
the nexus writing and reading work fine with h5py >= 3.10
bmaranville File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,14 +2,14 @@ | |
| Implementation of the top level class for the ORSO header. | ||
| """ | ||
|
|
||
| from dataclasses import dataclass | ||
| from typing import Any, List, Optional, TextIO, Union | ||
| from dataclasses import dataclass, fields | ||
| from typing import BinaryIO, List, Optional, Sequence, TextIO, Union | ||
|
|
||
| import numpy as np | ||
| import yaml | ||
|
|
||
| from .base import (Column, ErrorColumn, Header, _dict_diff, _nested_update, _possibly_open_file, _read_header_data, | ||
| orsodataclass) | ||
| from .base import (JSON_MIMETYPE, ORSO_DATACLASSES, Column, ErrorColumn, Header, _dict_diff, _nested_update, | ||
| _possibly_open_file, _read_header_data, orsodataclass) | ||
| from .data_source import DataSource | ||
| from .reduction import Reduction | ||
|
|
||
|
|
@@ -163,7 +163,7 @@ class OrsoDataset: | |
| """ | ||
|
|
||
| info: Orso | ||
| data: Any | ||
| data: Union[np.ndarray, Sequence[np.ndarray], Sequence[Sequence]] | ||
|
|
||
| def __post_init__(self): | ||
| if self.data.shape[1] != len(self.info.columns): | ||
|
|
@@ -210,6 +210,9 @@ def __eq__(self, other: "OrsoDataset"): | |
| return self.info == other.info and (self.data == other.data).all() | ||
|
|
||
|
|
||
| ORSO_DATACLASSES["OrsoDataset"] = OrsoDataset | ||
|
|
||
|
|
||
| def save_orso( | ||
| datasets: List[OrsoDataset], fname: Union[TextIO, str], comment: Optional[str] = None, data_separator: str = "" | ||
| ) -> None: | ||
|
|
@@ -273,3 +276,111 @@ def load_orso(fname: Union[TextIO, str]) -> List[OrsoDataset]: | |
| od = OrsoDataset(o, data) | ||
| ods.append(od) | ||
| return ods | ||
|
|
||
|
|
||
| def _from_nexus_group(group): | ||
| if group.attrs.get("sequence", None) is not None: | ||
| sort_list = [[v.attrs["sequence_index"], v] for v in group.values()] | ||
| return [_get_nexus_item(v) for _, v in sorted(sort_list)] | ||
| else: | ||
| dct = dict() | ||
| for name, value in group.items(): | ||
| if value.attrs.get("NX_class", None) == "NXdata": | ||
| # remove NXdata folder, which exists only for NeXus plotting | ||
| continue | ||
| dct[name] = _get_nexus_item(value) | ||
|
|
||
| ORSO_class = group.attrs.get("ORSO_class", None) | ||
| if ORSO_class is not None: | ||
| if ORSO_class == "OrsoDataset": | ||
| # TODO: remove swapaxes if order of data is changed (PR #107) | ||
| # reorder columns so column index is second: | ||
| dct["data"] = np.asarray(dct["data"]).swapaxes(0, 1) | ||
| cls = ORSO_DATACLASSES[ORSO_class] | ||
| return cls(**dct) | ||
| else: | ||
| return dct | ||
|
|
||
|
|
||
| def _get_nexus_item(value): | ||
| import json | ||
|
|
||
| import h5py | ||
|
|
||
| if isinstance(value, h5py.Group): | ||
| return _from_nexus_group(value) | ||
| elif isinstance(value, h5py.Dataset): | ||
| v = value[()] | ||
| if isinstance(v, h5py.Empty): | ||
| return None | ||
| elif value.attrs.get("mimetype", None) == JSON_MIMETYPE: | ||
| return json.loads(v) | ||
| elif hasattr(v, "decode"): | ||
| # it is a bytes object, should be string | ||
| return v.decode() | ||
| else: | ||
| return v | ||
|
|
||
|
|
||
| def load_nexus(fname: Union[str, BinaryIO]) -> List[OrsoDataset]: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be advantageous to have a single load_orso function that chooses which loader to use automatically. That would allow external software to be agnostic about the file type that's used. Suggested way of implementation:
|
||
| import h5py | ||
|
|
||
| f = h5py.File(fname, "r") | ||
| # Use '/' because order is not tracked on the File object, but is on the '/' group! | ||
| root = f['/'] | ||
| return [_from_nexus_group(g) for g in root.values() if g.attrs.get("ORSO_class", None) == "OrsoDataset"] | ||
|
|
||
|
|
||
| def save_nexus(datasets: List[OrsoDataset], fname: Union[str, BinaryIO], comment: Optional[str] = None) -> BinaryIO: | ||
| import h5py | ||
| h5py.get_config().track_order = True | ||
|
|
||
| for idx, dataset in enumerate(datasets): | ||
| info = dataset.info | ||
| data_set = info.data_set | ||
| if data_set is None or (isinstance(data_set, str) and len(data_set) == 0): | ||
| # it's not set, or is zero length string | ||
| info.data_set = idx | ||
|
|
||
| dsets = [dataset.info.data_set for dataset in datasets] | ||
| if len(set(dsets)) != len(dsets): | ||
| raise ValueError("All `OrsoDataset.info.data_set` values must be unique") | ||
|
|
||
| with h5py.File(fname, mode="w") as f: | ||
| f.attrs["NX_class"] = "NXroot" | ||
| if comment is not None: | ||
| f.attrs["comment"] = comment | ||
|
|
||
| for dsi in datasets: | ||
| info = dsi.info | ||
| entry = f.create_group(str(info.data_set)) | ||
| entry.attrs["ORSO_class"] = "OrsoDataset" | ||
| entry.attrs["ORSO_VERSION"] = ORSO_VERSION | ||
| entry.attrs["NX_class"] = "NXentry" | ||
| entry.attrs["default"] = "plottable_data" | ||
| info.to_nexus(root=entry, name="info") | ||
| data_group = entry.create_group("data") | ||
| data_group.attrs["sequence"] = 1 | ||
| plottable_data_group = entry.create_group("plottable_data", track_order=True) | ||
| plottable_data_group.attrs["NX_class"] = "NXdata" | ||
| plottable_data_group.attrs["sequence"] = 1 | ||
| plottable_data_group.attrs["axes"] = [info.columns[0].name] | ||
| plottable_data_group.attrs["signal"] = info.columns[1].name | ||
| plottable_data_group.attrs[f"{info.columns[0].name}_indices"] = [0] | ||
| for column_index, column in enumerate(info.columns): | ||
| # assume that dataset.data has dimension == ncolumns along first dimension | ||
| # (note that this is not how data would be loaded from e.g. load_orso, which is row-first) | ||
| col_data = data_group.create_dataset(column.name, data=dsi.data[:, column_index]) | ||
| col_data.attrs["sequence_index"] = column_index | ||
| col_data.attrs["target"] = col_data.name | ||
| physical_quantity = getattr(column, 'physical_quantity', None) | ||
| if physical_quantity is not None: | ||
| col_data.attrs["physical_quantity"] = physical_quantity | ||
| if isinstance(column, ErrorColumn): | ||
| nexus_colname = column.error_of + "_errors" | ||
| else: | ||
| nexus_colname = column.name | ||
| if column.unit is not None: | ||
| col_data.attrs["units"] = column.unit | ||
|
|
||
| plottable_data_group[nexus_colname] = col_data | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,3 +14,4 @@ typing_extensions | |
| coverage | ||
| coveralls | ||
| pint | ||
| h5py>=3.1.0 | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cleaner to use in-build python functionality instead of keeping track of our sub-classes ourselfs. All header classes are derived from
Headerso you could replace:with
and
with
(I think the first call misses sub-subclasses, so it's probably easier to implement it as recursive method in the Header class.)
In this case we are also safe with people sub-classing orso Header without using a decorator (e.g. adding functionality not attributes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subclass recursive search would miss OrsoDataset, which needs to be recognized as an ORSO class during deserialization. I think trying to make an inheritance tree that includes both Header and OrsoDataset is going to be more complicated.