Skip to content

Conversation

@daljit46
Copy link
Member

@daljit46 daljit46 commented Jun 11, 2025

This PR introduces Image::with_canonical_axes_order_layout(), a utility method to obtain an Image with its data preloaded into RAM in a standardized, contiguous layout. This layout ensures that axis 0 is the most contiguous (fastest varying), followed by axis 1, then axis 2, and so on, up to the highest dimension. All axes involved in this ordering will have positive effective strides. This is a common packed layout suitable for interoperability with GPU APIs (where axis 0 often maps to width, axis 1 to height, axis 2 to depth) and possibly other external libraries.

Checking the implementation of Image::with_direct_io, I believe that the function should also ensure that the resulting stride has all positive directions.

I'm slightly unsure about the naming though. I called the function canonical but I admit that may be confusing (the other one I was debating was standard).

Creates an image with a memory layout that ensures that axis 0 is the
most contiguous (fastest varying), followed by axis 1, then axis 2, and
so on, up to the highest dimension. All axes involved in this ordering
will have positive effective strides.
@daljit46 daljit46 requested a review from a team June 11, 2025 19:56
@daljit46 daljit46 self-assigned this Jun 11, 2025
@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

@Lestropie
Copy link
Member

There's already the function .with_direct_io(), which both ensures explicit load into RAM and allows any custom axis strides. What you're doing here is one use case of that, where you want the strides of [+1,+2,+3,...,+ndim()]. So I think the better solution is to have a function in cpp/core/stride.h that yields these canonical strides, which you can then pass to .with_direct_io().

One of the reasons for this more general solution is that it's quite common when working with 4D data to want data along the fourth direction to be contiguous. We don't care all that much about the spatial axes, its good to be able to manipulate them arbitrarily and that's exploited at image load but it's primarily making all data for a voxel contiguous that yields performance benefits. You might find that for what you're currently doing with 3D data you're seeking trivial spatial strides, whereas if you were to be trying to do something with 4D data on the GPU you'd instead be seeking the contiguous-along-nonspatial-axes solution.

@daljit46
Copy link
Member Author

daljit46 commented Jun 12, 2025

So I think the better solution is to have a function in cpp/core/stride.h that yields these canonical strides, which you can then pass to .with_direct_io().

My main motivation to add a new function to Image was just brevity, but I think your idea is cleaner and makes more sense.

You might find that for what you're currently doing with 3D data you're seeking trivial spatial strides, whereas if you were to be trying to do something with 4D data on the GPU you'd instead be seeking the contiguous-along-nonspatial-axes solution.

Would it also make sense to have something like a Stride::contiguous_along_higher_axes_symbolic? A function that takes in a "first_prioritized_axis" that will be made most contiguous, and subsequent axes will follow in order (axes before this will be assigned lower priority). For example, if you have a 5D image with the first three axes being spatial, the fourth time and fifth b-value (or something else), then the function would return [3, 4, 5, 1, 2] if the user passes in first_prioritized_axis=3.

@Lestropie
Copy link
Member

Would it also make sense to have something like a Stride::contiguous_along_higher_axes_symbolic?

Yep, something like that makes sense. Mostly we use contiguous_along_axis(3) because we haven't (yet) dealt much with 5D data and don't care much about what happens in the spatial axes so might as well leave them untouched. So there could probably be two: one that does contiguous along supra-spatial axes and preserves spatial axes, and another that does contiguous along supra-spatial axes and then canonical along spatial.

@jdtournier
Copy link
Member

Unfortunately it's not that simple. The main priority in with_direct_io() is to maximise performance in terms of CPU cache use and throughput, but also avoid unnecessary preloading as much as possible: if the data are memory-mapped and already in the correct data type with a suitable ordering, we use that. There isn't much of an advantage in terms of CPU cache whether the data are ordered with a positive stride (say, +1, left -> right) or negative stride (say, -1, right -> left): in both cases, data are contiguous and cache lines will be maximally utilised.

All this to say that requesting a specific set of strides is not guaranteed to give you those strides: it'll treat -1,+2,-3,+4 as equivalent to +1,-2,+3,-4.

So we either need an extra method (or option to with_direct_io(), my preferred solution) to switch to a strict interpretation of the strides provided, or we just change with_direct_io() to always interpret strides strictly, and accept that in some cases, images will be preloaded unnecessarily...

@jdtournier
Copy link
Member

By the way, in response to this:

Would it also make sense to have something like a Stride::contiguous_along_higher_axes_symbolic?

Yep, something like that makes sense. Mostly we use contiguous_along_axis(3) because we haven't (yet) dealt much with 5D data and don't care much about what happens in the spatial axes so might as well leave them untouched. So there could probably be two: one that does contiguous along supra-spatial axes and preserves spatial axes, and another that does contiguous along supra-spatial axes and then canonical along spatial.

This already exists: https://github.com/MRtrix3/mrtrix3/blob/master/core/stride.h#L386-L408

@daljit46
Copy link
Member Author

daljit46 commented Jun 12, 2025

Ok, so we could probably do something like this:

enum class StrideInterpretation {
  BestMatch,
  Exact
};

Then change with_direct_io to:

  // Currently we have Image with_direct_io(Stride::List with_strides = Stride::List())
  Image with_direct_io(Stride::List with_strides = Stride::List(), StrideInterpretation strideInterpretation = StrideInterpretation::BestMatch);

In stride.h we could also add a helper function to ensure positive strides:

void ensure_positive_symbolic_signs(List& symbolic_strides) {
  for (ssize_t& stride_val : symbolic_strides) {
    if (stride_val != 0) {
      stride_val = std::abs(stride_val);
    }
  }
}

Then, passing in a canonical symbolic list should do the job. If we want an even more general solution, I guess modifying Stride::get_nearest_match to accept something like enum class SignPreference { PreserveOriginal, PreferPositive, PreferNegative, MatchDesired }; could be an option.

@daljit46 daljit46 changed the title Add method for canonical direct I/O layout Add option for canonical direct I/O layout Jun 12, 2025
@daljit46
Copy link
Member Author

@jdtournier @Lestropie I've modified the PR based on your feedback and suggestions.

daljit46 added 2 commits June 12, 2025 13:10
A new StrideInterpretation option is introduced to determining the
target memory layout in with_direct_io. The default behaviour is
preserved with StrideInterpretation::NearestMatch and the user can pass
StrideInterpretation::Exact to use exact target symbolic layout.
@daljit46 daljit46 force-pushed the canonical_direct_io branch from 57a0437 to 63e0342 Compare June 12, 2025 12:11
@MRtrix3 MRtrix3 deleted a comment from github-actions bot Jun 12, 2025
@MRtrix3 MRtrix3 deleted a comment from github-actions bot Jun 12, 2025
@Lestropie
Copy link
Member

This already exists:

No, I don't think it does. The first two functions highlighted ensure contiguity along one axis, whereas my description was specifically for contiguity along all non-spatial axes.

Ok, so we could probably do something like this:

I don't think there should be any need for a second function argument to .with_direct_io() here. It should be possible to derive the strides based on the developer's desire and the strides of the input image. Thaat would then mean that anything else in the API that takes strides as input would be compatible, whereas with this proposal the capability is strictly tied to just .with_direct_io().

There's an existing (perhaps implicit?) capability within the strides handling that setting the stride to 0 in a list will, at the point at which the strides are actualised, be interpreted as "put these axes after all developer-specified axes, with the same relative ordering and sign as the input image". That should be adequate AFAICT; I think your proposal here is providing a parallel mechanism to existing capabilities.

Say you have an input image that is [-3,+1,+2,+4,+5]. What I originally believed you wanted is something like the following two functions:

  • contiguous_along_supraspatial_axes(), which produces: [0, 0, 0, +1, +2]; at the point of actualisation this will become [-5,+3,+4,+1,+2]
  • canonical_contiguous_along_supraspatial_axes(), which produces: [+3,+4,+5,+1,+2]

Is there a necessary scenario that isn't covered by either of these?

@jdtournier
Copy link
Member

The use case @daljit46 is specifically interested in here is getting images in & out of the GPU in an efficient manner, which means a single memory transfer. For that to work, we need to make sure the data are stored in the correct data type with exactly the strides specified (what @daljit46 referred to as 'canonical') - otherwise we'd need to handle arbitrary strides in the GPU shader, which seems likely to complicate things too much.

The current infrastructure almost covers this, up to the guaranteeing the sign of the strides. What we would need is a call like:

auto in = Image<float>::open ("filename").with_direct_io (Strides::InOrder, Strides::Strict);

or something like that, where Strides::InOrder is understood to mean 1 -> n_dim, and Strides::Strict is understood to mean "do not treat negative strides as equivalent to their positive counterpart".

We can easily do the Strides::InOrder bit with the current functionality (though it's probably a little clunky), but we don't currently cater for the strict interpretation of the strides. As I understand it, that's the issue that requires changes deeper in the API.

@daljit46
Copy link
Member Author

daljit46 commented Jun 13, 2025

Yes, as Donald said, canonical here is used as : contiguous + strict axis order + positive signs only. Note for the specific use case of uploading images as textures to the GPU, the idea only needs to be applied to the 2D and 3D case (as rendering APIs natively support 2D and 3D textures only and higher order images would be ordinary buffers). However, even for the general nD case, having the canonical direct IO layout would be useful when the built-in MRtrix3 facilities aren't available (e.g. in GPU shaders) as I believe it is the most algorithmically straightforward layout to deal with.

@Lestropie
Copy link
Member

The current infrastructure almost covers this, up to the guaranteeing the sign of the strides.

OK, I wasn't aware of the failure to enforce stride sign despite a developer very explicitly providing manual strides. I think that's the origin of the divergence.


If that's the case though, I'm still not sure that the proposed interface is the right approach. Having a singular data structure called "Strides", but then having a secondary flag indicating how the data in that structure should be interpreted only within some specific function, feels wrong.
If there are multiple types of image strides:

  1. Actualised, ie. the actual file pointer offsets to apply to step along a given axis;
  2. Symbolic, where both order and sign of non-zero entries are important;
  3. Ordinal, where the relative order of different axes matters but there is no notion of sign

, and your issue is that you need to achieve 2. but the API is only facilitating 3., then arguably these warrant being stored as different structures. The structure storing your strides should intrinsically reflect what they encode / how they are to be imposed, and not be dependent on conditional interpretation. 3 is just a permutation and should enforce non-negativity. Further, there may be elsewhere in the code base where there is a discrepancy between a developer's expectations of whether the signs of strides matters and how those data are utilised. Certainly that wasn't obvious to me, so this could well be a source of problems elsewhere...

@daljit46
Copy link
Member Author

The structure storing your strides should intrinsically reflect what they encode / how they are to be imposed, and not be dependent on conditional interpretation.

I somewhat agree with this, but I would argue that Image::with_direct_io already violates this principle. The signature of the function is:

Image with_direct_io(Stride::List with_strides = Stride::List());

At the site of declaration, the doc comment says:

The optional with_strides argument is used to additionally enforce preloading if the strides aren't compatible with those specified.

One would expect that if I pass in a given Stride::List, the function will return an image with a memory layout matching those strides, but that's not what happens. The implementation changes the symbolic strides by calling Stride::get_nearest_match before actualising them.
Hence, I think the problem here is specifically with how with_direct_io works. In my current proposal, I chose to add a second argument so that the caller can choose the behaviour of with_direct_io (no other stride-related code is affected), but the solution may not be optimal. In addition, I should probably change the StrideInterpretation enum to something like StrideEnforcement, as that conveys what with_direct_io is doing better.

@Lestropie
Copy link
Member

Lestropie commented Jun 18, 2025

The implementation changes the symbolic strides by calling Stride::get_nearest_match() before actualising them.

If that's the case, then perhaps rather than adding a flag that modulates whether Image::with_direct_io() internally calls Stride::get_nearest_match(), anything that is feeding Image::with_direct_io() with data that necessitates a call to Stride::get_nearest_match() should be instead giving it data that doesn't need to be run through that function (eg. by itself calling Stride::get_nearest_match() beforehand).

Or instead, check the contents of Stride::List for zeros:

  • If there are any zeros present, call Stride::get_nearest_match(), on the basis that the developer has provided as input to the function incomplete symbolic strides, where the missing contents should be filled in based on the existing image layout (since that's the most likely to be able to persist with the memory mapping and avoid an explicit RAM conversion).
  • If the input is complete symbolic strides, don't modify it in any way.

This is probably the more direct redress of the issue you're encountering, and doesn't require a new enum. You just provide as input a Stride::List with [1,2,3(,4,...)], and it gets actualised as specified.

@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

@Lestropie Lestropie force-pushed the dev branch 2 times, most recently from 70031c3 to 6bf4cec Compare August 26, 2025 08:11
@Lestropie Lestropie force-pushed the canonical_direct_io branch from 88cb9e5 to 63e0342 Compare August 29, 2025 04:43
@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants