Skip to content

Conversation

@bogovicj
Copy link
Contributor

@bogovicj bogovicj commented Sep 15, 2022

This PR has four main contributions:

  • The existing axis specification into coordinateSystems - named collections of axes.

  • coordinateTransformations now have "input" and "output" coordinate systems.

  • Adds many new useful type of coordinate transformations.

    • informative examples of each type are now given in the spec.
  • Describes the array/pixel coordinate system (origin at pixel center)

    • as agreed in issue 89 (see below)
  • Adds a longName field for axes

    • Also nice for NetCDF interop

See also:

bogovicj and others added 26 commits February 1, 2022 13:50
* define pixel center coordinate system
* add input/output_axes
* add transform inverse flag
* add affine transform type
* flesh out array space
* define array indexing
* add more transformation types
* start transformation details section and examples
* update example
* use "input" and "output" rather than '*Space' and '*Axes'
* reorder details
* clean up table
* add rotation
* details for sequence
* describe inverses
* wrap examples
* rephrase matrix storage
* change to "coordinates", removing "Field"
* change to "displacements", removing "Field"
* add details for transformation types
* (identity, inverseOf, bijection)
* describe inputAxes and outputAxes
* add new examples
* add mapIndex, mapAXis
* add examples
* affine stored as flat array only
* sequence does not have by-dimension behavior
* flesh out some examples
@github-actions
Copy link
Contributor

github-actions bot commented Sep 15, 2022

Automated Review URLs

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ome-ngff-community-call-transforms-and-tables/71792/1

@clbarnes
Copy link
Contributor

Could it be strongly recommended, if not required, that affine transforms use the JSON form and not an external array? The size of the JSON transformation matrix is not a problem, and it would be convenient to be able to build coordinate transformations solely from the metadata wherever possible. Obviously there are some types of transformation where this won't be possible and we will need to refer to an external array (e.g. displacement fields), but they are likely to be rare compared to affine transforms.

Is there a benefit to using an external array for such a small amount of data?

@thewtex
Copy link
Contributor

thewtex commented Sep 16, 2025

Using binary, multi-dimensional array encoding for the binary, multi-dimensional affine matrix makes it easier to preserve precision and reduces multi-dimensional array ambiguities like C vs Fortran order. I do not think a JSON representation should be required.

@d-v-b
Copy link
Contributor

d-v-b commented Sep 17, 2025

Using binary, multi-dimensional array encoding for the binary, multi-dimensional affine matrix makes it easier to preserve precision and reduces multi-dimensional array ambiguities like C vs Fortran order. I do not think a JSON representation should be required.

Multidimensional array is a type of content, JSON is a storage medium. So if we define a multi-dimensional array encoding for JSON , then the need for external arrays is reduced. See zarr-developers/zarr-extensions#22.

@thewtex
Copy link
Contributor

thewtex commented Sep 17, 2025

JSON is a text format, and it not designed for binary, multi-dimensional arrays. We already have support for binary, multi-dimensional arrays in Zarr -- it is the reason not to just use JSON in the first place. The multi-dimensional binary arrays are better represented in Zarr arrays.

@d-v-b
Copy link
Contributor

d-v-b commented Sep 17, 2025

Lossless encoding of numerical scalars in JSON is a actually a fundamental part of Zarr.. Without this, the fill_value field (stored in ... JSON!) would not work.

Here's a quick proof of principle showing how to store a 64 bit floating point array in JSON. This will work for any other data type supported by Zarr, because we are just re-using the fill value encoding.

# /// script
# requires-python = "==3.12"
# dependencies = ["zarr==3.1.2"]
# ///
from zarr.core.dtype import parse_dtype, ZDType
from zarr.core.dtype.common import DTypeSpec_V3
import numpy as np
from typing import Any, TypedDict

class NDArrayJSON(TypedDict):
    data: list[str]
    shape: list[int]
    dtype: DTypeSpec_V3

array = np.arange(10, dtype='float64').reshape(2,5) / 3

def _encode_array(data: Any, zarr_dtype: ZDType[Any, Any]) -> list[Any]:
    if isinstance(data, np.ndarray):
        return [_encode_array(sub_data, zarr_dtype) for sub_data in data]
    else:
        return zarr_dtype.to_json_scalar(data, zarr_format=3)

def _decode_array(data: Any, zarr_dtype: ZDType[Any, Any]) -> np.ndarray[tuple[int, ...], np.generic]:
    if isinstance(data, list):
        return [_decode_array(sub_data, zarr_dtype) for sub_data in data]
    else:
        return zarr_dtype.from_json_scalar(data, zarr_format=3)

def encode(data: np.ndarray[tuple[int, ...], np.generic]) -> NDArrayJSON:
    zarr_dtype = parse_dtype(data.dtype, zarr_format=3)
    return {
        "data": _encode_array(data, zarr_dtype),
        "shape": data.shape,
        "dtype": zarr_dtype.to_json(zarr_format=3),
    }
def decode(data: NDArrayJSON) -> np.ndarray[tuple[int, ...], np.generic]:
    zarr_dtype = parse_dtype(data["dtype"], zarr_format=3)
    return np.array(_decode_array(data["data"], zarr_dtype), dtype=zarr_dtype.to_native_dtype())

print('np array:')
print(array)
"""
np array:
[[0.         0.33333333 0.66666667 1.         1.33333333]
 [1.66666667 2.         2.33333333 2.66666667 3.        ]]
"""
encoded = encode(array)
print('json encoded array:')
print(encoded)
"""
json encoded array:
{'data': [[0.0, 0.3333333333333333, 0.6666666666666666, 1.0, 1.3333333333333333], [1.6666666666666667, 2.0, 2.3333333333333335, 2.6666666666666665, 3.0]], 'shape': (2, 5), 'dtype': 'float64'}
"""
decoded = decode(encoded)
print('is the decoded json encoded array identical to the original?????')
print(np.array_equal(array, decoded))
"""
is the decoded json encoded array identical to the original?????
True
"""

We can make this more elaborate if we want, but I think the format demonstrated here would be perfectly suitable for an affine transform matrix.

@thewtex
Copy link
Contributor

thewtex commented Sep 22, 2025

JSON inherently only supports number, which is only an IEEE double. And the encoding is text, which without extra precautions, can cause loss of precision with the decimal to binary conversion. And it inherently only supports 1D arrays.

Without this, the fill_value field (stored in ... JSON!) would not work.

It does not work with large integers.

Yes, with extra complexity and drawbacks and limitations, some binary multi-dimensional numerical arrays can be encoded in JSON. When Zarr was created, it was created with the goals to have something that a parser could easily be written for. I do not think extra complexity should be added for all implementations to provide functionality that already exists in a better form. We should favor a simple approach.

@d-v-b
Copy link
Contributor

d-v-b commented Sep 22, 2025

We should favor a simple approach.

I'm with you here. But for some applications, like storing 9 floating point numbers for an affine transform, writing out an entire zarr.json metadata document (which requires decisions about chunking, compression, etc) and writing out binary chunk data is not simpler than writing those 9 numbers to JSON.

Provided we can agree on a representation ,inlining a small array is conceptually simpler, saves IO for reading and writing, and it also means the code that parses the transforms doesn't need to do any Zarr IO. At a minimum it should be discussed seriously and not dismissed out of hand, especially given that I provided a complete working example.

@clbarnes
Copy link
Contributor

JSON inherently only supports number

This is not true - javascript numbers are IEEE-754 float64s, but JSON numbers are arbitrary precision decimals (see: no NaN or infinity, no bounds on number of digits). You're definitely right that it requires extra precautions to deserialise them as such in many languages, of course!

We already use JSON for scales and translations, which I'd argue is enough to say that JSON has sufficient precision for transformations, as well as for label metadata, which I'd argue is enough to say that JSON has sufficient precision for large integers. I think there is value in having the metadata alone be enough to calculate the pixel -> world space mappings without zarr IO - it allows OME-Zarr implementors to more easily abstract over different Zarr implementations. It also means that transformations can be determined in one read round trip rather than one plus at least 2 per global transform (metadata and array) plus at least 2 per scale level transform (per scale level, if you want to figure out the whole pyramid at once).

@jni
Copy link
Contributor

jni commented Sep 23, 2025

I don't feel suuuper strongly about this but I'd be +1 on allowing storing the affines in JSON, even if the recommendation is a zarr to ensure defined precision. (I'm not sure about that recommendation either based on the discussion above, but happy to defer.)

@d-v-b
Copy link
Contributor

d-v-b commented Sep 23, 2025

even if the recommendation is a zarr to ensure defined precision. (I'm not sure about that recommendation either based on the discussion above, but happy to defer.)

I actually feel pretty strongly that we should not perpetuate the inaccurate claim that JSON is somehow a problematic medium for numerical scalars, because this claim is effectively saying that Zarr doesn't work, and that's false.

To repeat, for clarity: Zarr defines a lossless scalar -> JSON transformation for every array scalar. Without this feature, the fill_value field would not work properly.

If people believe that this is impossible or problematic, then they should open an issue in the zarr specs repo, or review the one that already exists about encoding large integers in javascript, where the conclusion was "this is a javascript problem, which can be fixed in javascript".

I think in the context of this pr, we can assume that Zarr's fill value encoding works, and that means we can assume that there is no precision issue related to storing numerical scalars in JSON, provided we can re-use the fill value encoding.

There are still going to be transformations that are probably best represented as Zarr arrays, like deformation fields, but that decision should be based on solid technical grounds, such as the size of the warp field, the advantage to chunking it, etc.

@thewtex
Copy link
Contributor

thewtex commented Sep 23, 2025

I want to respond carefully here, because I think some of my earlier comments may have been interpreted as dismissal, which is not my intent. I did take the proposal seriously, read the example code closely, and considered it as a possible option. But the concerns I raised earlier have not yet been engaged with in full, and they are not minor issues:

  1. Array ordering (Fortran vs C)
    For multi-dimensional arrays, ambiguity of memory order is a very real source of mistakes when using a JSON representation. Zarr arrays encode these details explicitly and consistently, whereas an inline JSON multi-dimensional matrix risks silently introducing errors depending on the consumer. This is not simply a matter of preference—it is a correctness issue.
  2. Precision and representation issues
    There are fundamental limitations around JSON numeric encoding and large integers. Yes, Zarr encodes fill values as JSON scalars, but the stakes are different for affine transformation matrices containing rotations or scaling factors. Those require exact precision and consistent binary ↔ decimal ↔ binary preservation. Many language implementations will lose precision in this round trip. Moreover, large integer support is not guaranteed in JSON, and waving this away as a “JavaScript problem” does not make it disappear: JSON as a standard does not prescribe BigInt semantics, and this limitation exists across many language implementations, not only in JavaScript.
  3. Complexity across implementations
    Adding a second representation for small multidimensional arrays means all Zarr implementations across all languages would need to support this redundant functionality. That burdens implementors unnecessarily and increases the surface area for divergence or bugs. One of Zarr’s strengths has been consistency; diluting that by requiring both external-array and inline-JSON cases for affine transforms introduces exactly the kind of fragmentation we should avoid.
  4. Difference with existing scalar/vector transforms
    I’ve seen the comparison made to scale and translation—but those are much simpler cases: one-dimensional, short vectors, precision that is not as critical, and with no ordering ambiguities. Affines are qualitatively different: they are 2D, order-sensitive, and exact floating-point correctness is critical for downstream interoperability (think: geometric rotation consistency). Treating them as analogous to translation vectors misses these crucial differences.
  5. Criteria for decisions
    I agree fully that decisions should be based on solid technical grounds. And that is precisely why I am pushing back here. Inline-JSON may feel “simpler” for nine numbers in an example, but once we factor in precision, ordering, and multi-implementation complexity, the technical grounds tip the other way. Zarr arrays already solve these issues robustly, whereas an inline JSON encoding would reintroduce them, and force every implementation to handle an additional, fragile corner case.

So while I appreciate the working example and the motivation to simplify, I don’t think it adequately addresses the serious technical risks. Inline JSON for affine transforms may appear convenient for a toy case, but at scale and across a diverse ecosystem of implementations, it adds risk and complexity without delivering clear benefits.

@d-v-b
Copy link
Contributor

d-v-b commented Sep 23, 2025

  1. Array ordering (Fortran vs C)
    For multi-dimensional arrays, ambiguity of memory order is a very real source of mistakes when using a JSON representation. Zarr arrays encode these details explicitly and consistently, whereas an inline JSON multi-dimensional matrix risks silently introducing errors depending on the consumer. This is not simply a matter of preference—it is a correctness issue.

We can handle this by either using a representation that's unambiguous, or use a simpler representation and add C / F order metadata.

A C or F order declaration is necessary when interpreting a 1D sequence of scalars as an N-dimensional array. But in the working example I provided I did not represent the array values as a 1D sequence -- I used nested arrays, which have totally unambiguous dimensionality. [[2, 3]] is unambiguously an array with 1 element (which is an array with 2 elements), i.e. an array with shape (1, 2).

But if for some reason we didn't want to use the nested array representation and instead preferred a single 1D array, we could simply add an "order" field to the JSON metadata for the array, just like Zarr V2 did.

  1. Precision and representation issues
    There are fundamental limitations around JSON numeric encoding and large integers. Yes, Zarr encodes fill values as JSON scalars, but the stakes are different for affine transformation matrices containing rotations or scaling factors. Those require exact precision and consistent binary ↔ decimal ↔ binary preservation. Many language implementations will lose precision in this round trip. Moreover, large integer support is not guaranteed in JSON, and waving this away as a “JavaScript problem” does not make it disappear: JSON as a standard does not prescribe BigInt semantics, and this limitation exists across many language implementations, not only in JavaScript.

I think the key point here is not JSON per se but rather the behavior of JSON readers, and I do think this is a very important point. Storing numerical scalars in JSON means those scalars will be processed first by a JSON reader, and if some JSON readers will mangle large integers or high precision floats, then this is a strong argument against storing those values as JSON numbers. (These readers would also struggle with Zarr fill values, but we can ignore that for now.)

This is not in itself a strong argument against using JSON. It's an argument against using JSON numbers. As we are not optimizing for performance or storage space (this is JSON after all), we have a lot of options for representing scalars unambiguously. For example, we could use base64 encoding of the binary representation of the scalar, or we could generalize the string-based JSON encoding for floats defined in the zarr V3 spec. The point is, if JSON numbers are problematic, then we don't have to use them. JSON has other types!

  1. Complexity across implementations
    Adding a second representation for small multidimensional arrays means all Zarr implementations across all languages would need to support this redundant functionality. That burdens implementors unnecessarily and increases the surface area for divergence or bugs. One of Zarr’s strengths has been consistency; diluting that by requiring both external-array and inline-JSON cases for affine transforms introduces exactly the kind of fragmentation we should avoid.

The decisions we make in this PR only affect OME-Zarr. Choosing to use JSON in a particular way, e.g. to represent N-dimensional arrays, does not have an impact on all Zarr implementations across all languages.

OME-Zarr implementations would interpret the JSON objects in this RFC as declarations of transformations, and regular Zarr implementations could just see plain JSON. That's how the attributes field is supposed to work.

  1. Difference with existing scalar/vector transforms
    I’ve seen the comparison made to scale and translation—but those are much simpler cases: one-dimensional, short vectors, precision that is not as critical, and with no ordering ambiguities. Affines are qualitatively different: they are 2D, order-sensitive, and exact floating-point correctness is critical for downstream interoperability (think: geometric rotation consistency). Treating them as analogous to translation vectors misses these crucial differences.

See the earlier point. We need to choose a robust JSON representation. But JSON can totally work here.

  1. Criteria for decisions
    I agree fully that decisions should be based on solid technical grounds. And that is precisely why I am pushing back here. Inline-JSON may feel “simpler” for nine numbers in an example, but once we factor in precision, ordering, and multi-implementation complexity, the technical grounds tip the other way. Zarr arrays already solve these issues robustly, whereas an inline JSON encoding would reintroduce them, and force every implementation to handle an additional, fragile corner case.

So while I appreciate the working example and the motivation to simplify, I don’t think it adequately addresses the serious technical risks. Inline JSON for affine transforms may appear convenient for a toy case, but at scale and across a diverse ecosystem of implementations, it adds risk and complexity without delivering clear benefits.

I will at this point assume that we all agree that, with a little motivation, we can devise a lossless, safe encoding for numerical scalars in JSON. As stated earlier, array order ambiguity can be addressed either by using nested arrays or adding an order field. Multi-implementation complexity is a non-issue because this is narrowly scoped to OME-Zarr and everything in this PR will requires work from OME-Zarr implementations anyway. That covers all your objections.

The advantages of the inline representation of the transform parameters are pretty well established by now -- there's a reason nearly everyone uses a JSON array for scale and translation! It's easier!

So I view this discussion as resolving to a simple question: can we figure out how to safely encode N-dimensional arrays in JSON, or will we conclude that it's just too complicated? I hope we take the first option.

I think it's worth reflecting on the fact that the geospatial community has been successfully encoding spatial transformations and geometric objects in JSON for years: see PROJ JSON, GeoJSON. If they can do this, I bet we can too.

@thewtex
Copy link
Contributor

thewtex commented Sep 26, 2025

Thanks for the thoughtful reply—there are several important points here that deserve further clarification:

  • The difference between C and Fortran order is fundamentally a question of how multidimensional arrays are linearized in memory, not simply the shape of the array or the syntax of nested lists. Even if a 2D array is represented as nested arrays in JSON, the lack of an explicit “row-major” (C) or “column-major” (Fortran) indicator leads to real risk of inconsistent interpretation between implementations. This is not a theoretical issue: many scientific libraries (e.g., NumPy, ITK, MATLAB, Fortran, C/C++) use different conventions and this has historically led to subtle and critical errors for image transforms, matrix algebra, and IO. Adding an “order” field to JSON is possible, but it results in extra metadata that all libraries and every implementer have to parse and respect, increasing both the implementation burden and the chances for silent bugs.
  • The proposal to encode numeric scalars in JSON strings (rather than native JSON numbers) does not eliminate the complexity problem. Instead, it shifts complexity from number parsing to custom string handling—placing the burden on every implementer in every OME-Zarr consumer, across all supported languages. Many languages do not support all of the semantics robustly, and requiring out-of-band logic for parsing arrays of numbers with arbitrary encodings (base64, hex, special-formatted strings) runs counter to the goal of having Zarr and OME-Zarr metadata that is consistently, easily, and natively interpretable. Introducing non-standard encodings for numbers in JSON simply for the convenience of small parameter arrays does not simplify things—it increases the cost for every downstream developer.
  • OME-Zarr’s value comes from a robust ecosystem of multi-language, multi-platform libraries and services—used in Python, C++, Java, R, and more. If each is forced to implement ad hoc logic for one-off JSON representations, we multiply the surface area for bugs and reduce reliability and interoperability.
  • Introducing alternate JSON encodings for transform matrices duplicates existing, well-supported, and reliable Zarr array functionality. The current Zarr array approach—using chunked, binary representations in a language-neutral, thoroughly specified way—already solves these problems with well-established semantics and broad support. There is no compelling technical reason to introduce new, bespoke JSON encodings when robust solutions already exist.
  • Speaking from direct experience: after two decades in computational image registration, developing, maintaining, and interoperating with a half-dozen different spatial transformation file formats—many of them text-based—the temptation to encode structured binary data in plain-text files or JSON is familiar. In practice, this leads to years of subtle interoperability issues, inconsistencies in order and rounding, and a maintenance burden for libraries and for users. Even seemingly trivial cases (like affine 3x3 or 4x4 matrices) have a long history of ambiguities and headaches when stored in “simple” plain-text formats.

In summary: there is wisdom in the principle of minimizing unnecessary complexity and duplicated functionality. OME-Zarr and Zarr already offer a technically robust, future-proof way to store multidimensional numerical parameters and avoid precisely these pitfalls. The best path forward is to leverage that infrastructure, not to reimplement an ad hoc alternative in metadata, no matter how tempting JSON may appear at first glance.

@d-v-b
Copy link
Contributor

d-v-b commented Sep 26, 2025

OME-Zarr’s value comes from a robust ecosystem of multi-language, multi-platform libraries and services—used in Python, C++, Java, R, and more. If each is forced to implement ad hoc logic for one-off JSON representations, we multiply the surface area for bugs and reduce reliability and interoperability.

This objection doesn't work in the context of this PR, which is all about adding new JSON representations that all OME-Zarr implementations will need to support. So the term "ad hoc" doesn't really fit -- everything in this PR is equally "ad-hoc", and everything in this PR will require work from multi-language, multi-platform services, etc. This kind of objection is unpersuasive. I would focus more on the technical merits / demerits of the specific proposal to use JSON for array scalars, and less on the burden for implementations.

The proposal to encode numeric scalars in JSON strings (rather than native JSON numbers) does not eliminate the complexity problem.

Which complexity problem are you thinking of? The goal of encoding scalars parameters in JSON is so that they can be read from JSON. You brought up concerns about native JSON parsers mangling numbers, so I introduced the base64 proposal which addresses that. But now apparently it makes things too complex.

In summary: there is wisdom in the principle of minimizing unnecessary complexity and duplicated functionality.

I don't think you will find anyone in this thread, or on this planet, who disagrees with this claim, and I don't think it helps your case to frame those who disagree with you as supporting "unnecessary complexity and duplicated functionality". With that, I'm not really interested in disagreeing with you any more.

To everyone else: the very question we are discussing is whether using Zarr IO operations (fetching a zarr.json object from storage, parsing it, then fetching the chunk(s) of that array) to fetch 9 numbers counts as "unnecessary complexity". I proposed a working solution that allows us to remove the need for that IO with no loss in numerical precision. Anyone motivated to keep the metadata parsing as simple as possible should consider this idea, or come up with a better one!

@normanrz
Copy link
Contributor

To me it seems a reasonable abstraction to use Zarr for storing the matrices. The IO overhead (at least 2 sequential IO ops) is unfortunate, though. I think that is something that might be better addressed at the Zarr level than at the OME-Zarr level.
I like the idea of inlining the chunk storage (e.g. via base64) in a Zarr array zarr.json. That would bring the IO ops down to 1.

@d-v-b
Copy link
Contributor

d-v-b commented Sep 26, 2025

I think that is something that might be better addressed at the Zarr level than at the OME-Zarr level.
I like the idea of inlining the chunk storage (e.g. via base64) in a Zarr array zarr.json. That would bring the IO ops down to 1.

there is also an intermediate solution to consider -- inline the array metadata, and additionally add a path sufficient to resolve the location of the chunks. This would allow some level of validation to be done on the array properties (like checking that it has the right shape and data type), but chunk IO would be necessary for getting the transform values. This saves 1 IOP, but the price is duplicated information (1 copy of array metadata inside the transforms meta, and 1 copy in its usual location in external storage). Not sure how that pencils out.

@jni
Copy link
Contributor

jni commented Oct 12, 2025

@d-v-b

I don't think you will find anyone in this thread, or on this planet, who disagrees with this claim, and I don't think it helps your case to frame those who disagree with you as supporting "unnecessary complexity and duplicated functionality". With that, I'm not really interested in disagreeing with you any more.

To everyone else:

I think this was a very rude way to respond to Matt's comments, which explicitly addressed why he thought your (our!) proposal increased complexity, and why he thought it was unnecessary. You may disagree, you may have your reasons why you think the complexity is worthwhile or overstated (as I do!), but you can easily make your point without being so dismissive — even hostile. That does not help move the discussion forward either, and in fact can drive people away from it altogether.

It's also ok to compromise here. It is easier to be more restrictive at first and then open up alternative ways to store the matrix than it is to do the converse. I am well aware that involves toil and churn (eg RFC3), but ome-ngff is a community-developed standard. To participate, we should abide by community norms, which includes being respectful of other participants in the discussion.

@d-v-b
Copy link
Contributor

d-v-b commented Oct 12, 2025

@d-v-b

I don't think you will find anyone in this thread, or on this planet, who disagrees with this claim, and I don't think it helps your case to frame those who disagree with you as supporting "unnecessary complexity and duplicated functionality". With that, I'm not really interested in disagreeing with you any more.
To everyone else:

I think this was a very rude way to respond to Matt's comments, which explicitly addressed why he thought your (our!) proposal increased complexity, and why he thought it was unnecessary. You may disagree, you may have your reasons why you think the complexity is worthwhile or overstated (as I do!), but you can easily make your point without being so dismissive — even hostile. That does not help move the discussion forward either, and in fact can drive people away from it altogether.

It's also ok to compromise here. It is easier to be more restrictive at first and then open up alternative ways to store the matrix than it is to do the converse. I am well aware that involves toil and churn (eg RFC3), but ome-ngff is a community-developed standard. To participate, we should abide by community norms, which includes being respectful of other participants in the discussion.

Hi Juan, thanks for this feedback, and I apologize for the tone of my remarks. But I stand by the content of what I said, which is that the language used by @thewtex makes me not want to engage with him further on this topic.

@thewtex wrote:

In summary: there is wisdom in the principle of minimizing unnecessary complexity and duplicated functionality. OME-Zarr and Zarr already offer a technically robust, future-proof way to store multidimensional numerical parameters and avoid precisely these pitfalls. The best path forward is to leverage that infrastructure, not to reimplement an ad hoc alternative in metadata, no matter how tempting JSON may appear at first glance.

This statement and much of the rest of his comment is not rude, but is quite frustrating -- it implies that I have forgotten the "wisdom in the principle of minimizing unnecessary complexity and duplicated functionality.", that my proposal is "ad-hoc" (he uses more colored language earlier in the issue, saying my proposal is "ad hoc logic for one-off JSON representations". These are things you say when you are talking past someone, not talking to them. I should have more directly signalled my concerns with this counter-productive and frustrating rhetoric, but I fumbled and ended up just saying something rude. Sorry about that.

Finally, it feels extremely dismissive of the constructive work I put into this thread (including writing a working implementation!) to (indirectly) characterize my position as supporting "unnecessary complexity". As a reminder, the basis of this discussion is about whether a metadata parser for OME-Zarr can be expected to only parse JSON, or if it also needs to include functionality for fetching and decoding Zarr chunks. The goal of re-using Zarr's fill value encoding for scalars, or defining a new scalar encoding using base64, is to reduce the complexity of OME-Zarr metadata readers, so that they can just parse JSON. To hear this position characterized as promoting unnecessary complexity totally misses the point of the idea.

@thewtex
Copy link
Contributor

thewtex commented Oct 28, 2025

Davis, I want to be absolutely clear that my goal is not to frustrate or alienate you. I respect your contributions, careful reasoning, and the effort that’s gone into the ideas and working code you’ve shared here. If I point out issues, it’s because I believe improving how we reason through these technical topics is how we arrive at robust standards. Sometimes that means surfacing aspects where there are real misunderstandings or unintended implications—about technical feasibility, standardization, or future maintainability. Raising those concerns is about strengthening the spec, not about questioning your intentions or expertise.

Regarding a few points you raised:

it implies that I have forgotten the "wisdom in the principle of minimizing unnecessary complexity and duplicated functionality."

Not at all. I know you value this principle as much as anyone in the conversation. When I cite such motives, it’s not to imply you’ve forgotten it, but rather to clarify how I’m justifying my own reasoning to the group, and to make sure we’re all explicit about the trade-offs.

my proposal is "ad-hoc" (he uses more colored language earlier in the issue…)

“Ad hoc” is meant here purely in the dictionary sense—formed for a specific, immediate purpose—not as an attack. JSON’s core objective is to represent JavaScript objects. It does not natively encode multi-dimensional binary matrices, so devising a format to do that within JSON, with new manual conventions for shape/stride/order and encoding, is using JSON for an “ad hoc” purpose. I do not mean this as an accusation of ill intent, but as a precise technical distinction for why new parsing logic is needed.

“...ad hoc logic for one-off JSON representations”.

Again, this was meant to describe the proposal’s effect, not your motivation. The implementation is “one-off” in the sense that it encodes and requires consumers to decode a representation not natively supported by JSON, and not shared by other array/matrix libraries (especially outside Python/NumPy). This means the approach would require significant new implementation in every OME-Zarr reading/writing library (JavaScript, C++, Rust, etc.), while Zarr arrays already provide that cross-language functionality. My concern is the increased task surface for everyone—especially for future, non-Python users.

These are things you say when you are talking past someone, not talking to them.

That’s not my intention. I am trying to talk to you (and the group), explaining and justifying each concern as clearly as possible—sometimes repeatedly, as is necessary for complex technical points.

Finally, it feels extremely dismissive of the constructive work I put into this thread (including writing a working implementation!) to (indirectly) characterize my position as supporting "unnecessary complexity".

Davis, I genuinely recognize—and want the thread to recognize—the work and thoughtfulness you’ve invested here, not just in argument but in concrete code. To reinterate what I said before:

I appreciate the working example and the motivation to simplify

Calling out added complexity is a critique of the proposal’s consequences, not a judgement of your effort or its motivation.

I do see the value and appeal of wanting everything (especially small matrices) in a single JSON. And I agree there is a UX benefit in reducing fetches or dependencies. But, from long experience working across multiple file formats and libraries (ITK, VTK, NumPy, Dask, etc.), every time complexity is added to encoding rules, or decisions are left to implementers (“base64 here, shape there, order flag there”), these multiply into real maintenance burdens and subtle parsing bugs—especially outside of Python or by future contributors less familiar with the rationale.

To summarize where I stand:

Practically, Zarr arrays already support all the representations for affine matrices—cross-language, documented, and robust. If we require all OME-Zarr implementers to write and maintain JSON-special logic for parsing, byte-order/layout handling, and shape validation for every matrix, that’s not just “one extra parser,” it’s many, for every tool and language, now and in the future.

The cost/benefit, in my view, is highly unfavorable compared to just leveraging Zarr’s existing cross-language array mechanisms.

I want us to keep moving forward, and completely support more restrictive initial approaches so we don’t back ourselves into a corner. I’m honestly excited by the recent RFC progress, and I’m looking forward to more productive discussions—both online and in-person at the next hackathon.

Thanks to everyone for pushing this community process, even through difficult technical and social conversations. That’s how good standards are forged.

@clbarnes
Copy link
Contributor

Both (JSON) decimals and IEEE floats are marginally fuzzy approximations of a number, but it's almost certain that the registration process would be working with the IEEE approximation and so JSON values do present a higher risk of loss of fidelity. In many languages, you have to try quite hard to deserialise JSON decimals as decimals, and it's all going to end up as IEEE anyway when it comes to doing maths with them. Where scales and translations are generally pretty rational values with a small number of significant figures, the values of affine transformations are more likely to be whacky where the difference could be more significant.

Multiscale images implementations are likely to reify all of the (reasonably-sized) transformations on first read, so while it's a lot of IO operations, they're probably going to be concurrent and only done once. I would certainly steer away from duplicating the metadata at a higher level; too much opportunity for drift. Displacement fields mean that we're never going to be able to validate and preload all transformations as a solely metadata operation, which was my hope in suggesting the JSON affine array.

All this to say, I'm very ambivalent about the two representations, and on the basis that all of this is going to be hidden away from the end user anyway and it isn't nearly the most complicated thing about coordinate systems, it can really go either way. I don't think either decision is particularly wrong, but if both are to be supported, a recommended default in the text would be helpful.

@d-v-b
Copy link
Contributor

d-v-b commented Oct 29, 2025

Hi Matt, thanks for the clarifications. I appreciate that you didn't intend to cause any frustration. But I think we are still coming from very different perspectives here, which is probably why I have trouble putting your objections into the broader context of this PR, and may be misinterpreting you.

You write:

“Ad hoc” is meant here purely in the dictionary sense—formed for a specific, immediate purpose—not as an attack. JSON’s core objective is to represent JavaScript objects. It does not natively encode multi-dimensional binary matrices, so devising a format to do that within JSON, with new manual conventions for shape/stride/order and encoding, is using JSON for an “ad hoc” purpose.

This might be a key difference in perspective. I would not describe JSON's core objective as describing JavaScript objects. I think a better description, via the JSON spec is:

JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language."

As JSON is a data-interchange language, collectively devising a new JSON encoding for something like an affine transform is simply using JSON for its intended purpose. It's strictly true that JSON does not natively encode multi-dimensional binary matrices, but JSON also does not natively encode coordinate transformations, or complex numbers, or Zarr array metadata, etc. that's ok — we devised encodings for these things using the JSON type system. Such encodings are often devised for a specific, immediate purpose (ad-hoc in the dictionary sense) but as this origin story is so common I don't find the "ad-hoc" label particularly helpful in the context of this PR, and I certainly don't see why the JSON encoding scheme I'm proposing is ad-hoc while the rest of this PR is not.

You say:

every time complexity is added to encoding rules, or decisions are left to implementers (“base64 here, shape there, order flag there”) these multiply into real maintenance burdens and subtle parsing bugs—especially outside of Python or by future contributors less familiar with the rationale.

What I am proposing is that we define a specified encoding scheme, not vague suggestions for implementers to interpret as they like. I don't think anyone has proposed something like "base64 here, shape there, order flag there", so I don't think this is a helpful characterization. I don't think anything proposed here is python-specific. And the rest of this objection targets an underspecified encoding, which nobody here wants.

For this concern to land, you should take the specific proposals offered upthread and show how they are underspecified, and then we can design a fix, or conclude that it's unfixable. Saying "trust my experience from other projects" reads as dismissive unless you use your experience to point to something concrete from those projects that informs the discussion here. Which of the projects you listed tried, and struggled with, encoding 2D matrices in JSON? Was the conclusion to never use JSON for numerical data? Concrete examples would be helpful.

Turning to the content of this PR and the discussion in this thread (and if this PR is not the latest version of RFC5, then please direct me to where I can find the latest version): there are 4 ways of encoding matrix transformations:

  1. as nested JSON arrays,
  2. as a single JSON array of numbers.
  3. as a path to "binary data", presumably a Zarr array.
  4. as an inline JSON n-dimensional array, e.g. with base64-encoded values (my proposal from upthread)

You have raised concerns about encoding numerical scalars in JSON, due to three concerns:

  • numerical precision issues with JSON numbers
  • C vs F order ambiguity
  • maintenance burden / ambiguity across languages

Although I think there are JSON encodings that address these concerns, such as the one I have proposed and implemented, I am curious to hear your view on the two JSON encodings (array of arrays and flat array) for numerical arrays that are currently written in this PR. Based on your claim that JSON is fundamentally ill-suited for numerical arrays, it seems you would support removing or deprecating encodings 1 and 2, and instead recommending that numerical array data be stored outside of JSON?

But if those encodings are actually OK for some cases, then we probably need a clear recommendation for when transform parameters are stored in JSON vs when they are stored in external arrays, and what the rules are for those external arrays. How are the axes of the external arrays interpreted? Do the external arrays need to use the dimension_names attribute? What codecs are permissible? Recall that unlike Zarr V2, Zarr V3 does not model the memory order of decoded arrays, this is left as an implementation detail.

A final note about the "simplicity" of Zarr arrays: compared to JSON, Zarr has a shorter history and within that history there have been some pretty significant interoperability issues across implementations. Not all Zarr implementations support the same data types or codecs, and some published Zarr implementations have generated incompatible array data e.g. due to ambiguous fill value encoding. Another important example: zarr python 2.x defaulted to saving arrays with the blosc codec, but distributing blosc to java clients like fiji was a source of many problems, a history of which you can see here. It would be very unfortunate if metadata parsing failed because zarr interop issues. JSON, by comparison, is older, more widely supported, and much simpler than Zarr (by definition).

@will-moore
Copy link
Member

Both affine and rotation are now strictly 2D arrays, either in JSON or as a path to zarr array.

See #350

Changes there are being migrated to...
https://ngff-spec.readthedocs.io/en/latest/#coordinatetransformations-metadata is not quite up to date, see ome/ngff-spec#17

"affine transformation matrix stored as a 2D array, either directly in JSON using the affine field or as binary data at a location in this container (path). If both are present, the binary values at path should be used."

@d-v-b
Copy link
Contributor

d-v-b commented Oct 29, 2025

binary data at a location in this container (path). If both are present, the binary values at path should be used."

Maybe "binary data" should be changed to "Zarr array", if that's the intention?

@jo-mueller
Copy link
Contributor

jo-mueller commented Oct 29, 2025

Hi Davis @d-v-b , I have continued the work on RFC on a different branch. I referenced this PR a few times, but I think the active development head is now there (#350) (aside from the transistion to ngff-spec.

While most changes seem to be related to the comments and suggestions from the review process, going through the discussion here and especially some of your comments further up was very helpful to sharpen the text of rfc5 into what it is now. Would love to hear your feedback and thoughts on it.

@joshmoore
Copy link
Member

Thanks, @jo-mueller. Let me suggest then that with RFC-5 moving forward elsewhere we close this PR to make it clear that at least this draft of the RFC-5 is not the specific text that we are (or should be) discussing. Hopefully it goes without saying, but this is NOT a change of direction but just a refresh based on all the comments received to date. Many thanks to @bogovicj (and everyone else) for all the work that went into this PR. 🙌🏽

On the concrete array encoding conversation that kicked off with #138 (comment), I’d like to add two meta-thoughts from my side:

  • Firstly, I’m not sure that long-form GitHub comments are necessarily the most expedient means here. It might be that a real-time conversation could help to reach a workable consensus. If we don’t think that’s possible, I’d certainly appreciate a summary from the perspectives as Comments on the RFC so that can flow into the final public record and decision making.

  • Secondly, I missed the very beginning of the encoding conversation due to travel. After reading the first several comments, what I would have really liked to have do (and this has come up several times in the NGFF process) is to move the related comments all to a new issue like we do on image.sc or Zulip. Unfortunately, that’s not possible. So for everyone here, perhaps keep that in mind. Let’s try to foster a habit of breaking our conversations into manageable pieces. From my side, apologies for not catching this sooner. I will work on defining and documenting a hopefully not too disruptive pattern for us all to say, “oh, this is clearly something that needs working through, but let’s take it over there / off-line / etc.”

Thanks everyone for the effort put into the comments and the willingness to discuss difficult topics. It is exactly the point of building a specification together. ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.