Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions numba_cuda/numba/cuda/cudadrv/devicearray.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,11 +178,9 @@ def _numba_type_(self):
# of which will be 0, will not match those hardcoded in for 'C' or 'F'
# layouts.

broadcast = 0 in self.strides and (self.size != 0)

if self.flags["C_CONTIGUOUS"] and not broadcast:
if self.flags["C_CONTIGUOUS"]:
layout = "C"
elif self.flags["F_CONTIGUOUS"] and not broadcast:
elif self.flags["F_CONTIGUOUS"]:
layout = "F"
else:
layout = "A"
Expand Down
25 changes: 13 additions & 12 deletions numba_cuda/numba/cuda/cudadrv/dummyarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -275,19 +275,11 @@ def _compute_layout(self):
# 13661ac70).
# https://github.com/numpy/numpy/blob/maintenance/1.19.x/numpy/core/src/multiarray/flagsobject.c#L123-L191

flags = {"C_CONTIGUOUS": True, "F_CONTIGUOUS": True}

# Records have no dims, and we can treat them as contiguous
if not self.dims:
return {"C_CONTIGUOUS": True, "F_CONTIGUOUS": True}

# All 0-size arrays are considered contiguous, even if they are multidimensional
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this special case is needed because of the (erroneous) special case immediately following on line 286. Once the "If this is a broadcast array then it is not contiguous" case is gone, then I think this case was not needed to avoid it.

if self.size == 0:
return {"C_CONTIGUOUS": True, "F_CONTIGUOUS": True}

# If this is a broadcast array then it is not contiguous
if any([dim.stride == 0 for dim in self.dims]):
return {"C_CONTIGUOUS": False, "F_CONTIGUOUS": False}

flags = {"C_CONTIGUOUS": True, "F_CONTIGUOUS": True}
return flags

# Check C contiguity
sd = self.itemsize
Expand Down Expand Up @@ -342,7 +334,16 @@ def __getitem__(self, item):

arr = Array(dims, self.itemsize)
if newshape:
return arr.reshape(*newshape)[0]
# "A" order is required here so that the reshape can read and write
# the elements in fortran-like index order if the array is
# fortran-contiguous, and in C-like order otherwise. Otherwise, we
# end up reshaping fortran-contiguous arrays to C-contiguous,
# transposing indices in the process.
#
# See the NumPy reshape documentation of the order argument for a
# description of the behaviour this is following:
# https://numpy.org/doc/stable/reference/generated/numpy.reshape.html
return arr.reshape(*newshape, order="A")[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: verify that the default order in reshape() (line 390) being "C" doesn't conflict with this "A" order usage - confirm all test cases pass, especially for fortran-contiguous arrays

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All test cases do pass for me locally (CI is still pending). With the other changes in this PR but without this one, test_devicearray_broadcast_host_copy() would fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the correct computation of the contiguity of broadcasted arrays, I think the reshape(*newshape) call become "wrong", because the default order is "C". Whereas previously we would not have considered a broadcasted fortran-order array as contiguous, and therefore not transposed its indices (or data), we now do recognise broadcasted fortran-order data as f-contiguous, so we need to allow the reshape the freedom to read / write it in F order.

Without this change, the following:

# Derived from test_devicearray_broadcast_host_copy

import numpy as np
from numba import cuda

# Set up a broadcasted array as per the test case
broadsize = 4
coreshape = (2, 3)
coresize = coresize = np.prod(coreshape)
core_f = np.arange(coresize).reshape(coreshape, order="F")

dim = 0
newindex = (slice(None),) * dim + (np.newaxis,)
broadshape = coreshape[:dim] + (broadsize,) + coreshape[dim:]
broad_f = np.broadcast_to(core_f[newindex], broadshape)

dbroad_f = cuda.to_device(broad_f)

# Set up the index with which to slice the array
core_index = tuple([0, slice(None, None, None), slice(None, None, None)])

# For info, display the original arrays' shape and strides

print("NumPy shape and strides - original array:")
print(broad_f.shape)
print(broad_f.strides)

print("Device array shape and strides - original array:")
print(dbroad_f.shape)
print(dbroad_f.strides)

# Slice the NumPy and device arrays

sliced_broad_f = broad_f[core_index]
sliced_dbroad_f = dbroad_f[core_index]

# For info, display the sliced arrays' shape and strides

print("NumPy shape and strides - sliced array:")
print(sliced_broad_f.shape)
print(sliced_broad_f.strides)

print("Device array shape and strides - sliced array:")
print(sliced_dbroad_f.shape)
print(sliced_dbroad_f.strides)

# Demonstrate that the device array did not correctly preserve the strides
assert sliced_broad_f.strides == sliced_dbroad_f.strides

would fail with:

NumPy shape and strides - original array:
(4, 2, 3)
(0, 8, 16)
Device array shape and strides - original array:
(4, 2, 3)
(0, 8, 16)
NumPy shape and strides - sliced array:
(2, 3)
(8, 16)
Device array shape and strides - sliced array:
(2, 3)
(24, 8)
Traceback (most recent call last):
  File "/home/gmarkall/numbadev/issues/612/repro.py", line 48, in <module>
    assert sliced_broad_f.strides == sliced_dbroad_f.strides
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

when it should have printed:

NumPy shape and strides - original array:
(4, 2, 3)
(0, 8, 16)
Device array shape and strides - original array:
(4, 2, 3)
(0, 8, 16)
NumPy shape and strides - sliced array:
(2, 3)
(8, 16)
Device array shape and strides - sliced array:
(2, 3)
(8, 16)

A more simple reproducer that just invokes the core "dummy array" logic around this looks like:

from numba.cuda.cudadrv.dummyarray import Array

offset = 0
shape = (4, 2, 3)
strides = (0, 8, 16)
itemsize = 8
arr = Array.from_desc(offset, shape, strides, itemsize)

index = tuple([0, slice(None, None, None), slice(None, None, None)])
sliced = arr[index]

print(sliced.shape)
print(sliced.strides)

and prints

(2, 3)
(8, 16)

(correct F-order strides) with this fix, and

(2, 3)
(24, 8)

(correct C-order strides, but incorrect F-order strides, and we need F-order).

else:
return Element(arr.extent)

Expand Down
Loading