Skip to content

Conversation

@andyfaff
Copy link
Contributor

Added a separate validation for column properties. Eventually this validation should simplify the schema creation.

Instead of raising a jsonschema.exceptions.ValidationError I decided to raise a ValueError - the validation of the column properties is not being done by jsonschema.

Should there be extra checks?

@aglavic
Copy link
Collaborator

aglavic commented Mar 1, 2024

Do we check anywhere, that the shape of column data is equal? If not, is this mendatory?

@andyfaff
Copy link
Contributor Author

andyfaff commented Mar 1, 2024

The number of columns needs to be congruent with the data. I would say that the number of data points should be the same for all columns.

For an ORB file a column has the potential to be multidimensional. For an ORT file each column is expected to have ndim=1. So for a concatenated datapoint,column you would expect a shape of (ndata, ncol). A question is how to hold it internally, as a tuple/list of columns, or as a 2D array. I believe we were aiming for the former, as it would make holding ORB data (i.e. column ndim>=1) more easily?

But yes, easy to add the check on load.

@bmaranville
Copy link
Contributor

In the __post_init__ for OrsoDataset, we're checking that the number of columns in the data matches the number of columns defined in the header: https://github.com/reflectivity/orsopy/blob/main/orsopy/fileio/orso.py#L156
But we're not checking the first dimension of the columns, which should match.

What about this type of check?

    def __post_init__(self):
        if len(self.data) != len(self.info.columns):
            raise ValueError("Data has to have the same number of columns as header")
        column_lengths = set(len(c) for c in self.data)
        if len(column_lengths) > 1:
            raise ValueError("Columns must all have the same length in first dimension")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants