Skip to content
This repository was archived by the owner on Feb 14, 2020. It is now read-only.
This repository was archived by the owner on Feb 14, 2020. It is now read-only.

Support shape and slicing syntax for numpy compatibility #10

@mrocklin

Description

@mrocklin

Cool project. I gave it a shot with an eye towards using it with dask arrays. I have some feedback on the numpy slicing protocol.

A common API for array storage technologies is to mimic Numpy slicing syntax:

>>> array[:5, ::2, 100]
... my numpy array ... 

I'm glad to see that diced supports much of this API. This makes it much easier to interact with with other libraries. After looking through the README and trying things out I got as far as the following:

from diced import DicedStore
store = DicedStore("gs://flyem-public-connectome")
repo = store.open_repo("medulla-training")
array = repo.get_array('training2-grayscale')

>>> array[0, 0, 0:5:1]
array([ 89,  95, 103, 103,  89], dtype=uint8)

>>> array.dtype
<ArrayDtype.uint8: <type 'numpy.uint8'>>

This is great to see! Some critical feedback:

  1. It would be good to add array.shape as well
  2. It would be useful if the dtype object was actually just a numpy dtype, rather than a custom diced-specific type
  3. Slicing only works if all dimensions are specified and the elements of the slices are specified explicitly
In [21]: array[0]
---------------------------------------------------------------------------
DicedException                            Traceback (most recent call last)
<ipython-input-21-bdd46aa5d024> in <module>()
----> 1 array[0]

/home/mrocklin/Software/anaconda/envs/diced/lib/python2.7/site-packages/diced/DicedArray.pyc in __getitem__(self, index)
    179 
    180         if self.numdims != dimsreq:
--> 181             raise DicedException("Array has a different number of dimensions than requested")
    182 
    183         z = y = x = slice(0,1)

DicedException: Array has a different number of dimensions than requested

In [22]: array[0, 0, :5]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-ab665471e927> in <module>()
----> 1 array[0, 0, :5]

/home/mrocklin/Software/anaconda/envs/diced/lib/python2.7/site-packages/diced/DicedArray.pyc in __getitem__(self, index)
    195         zsize = z.stop - z.start
    196         ysize = y.stop - y.start
--> 197         xsize = x.stop - x.start
    198         if zsize*ysize*xsize > self.MAX_REQ_SIZE:
    199             data = np.zeros((zsize, ysize, xsize), self.dtype.value)

TypeError: unsupported operand type(s) for -: 'int' and 'NoneType'

For reference this interface of dtype, shape, and slicing is supported by h5py, netcdf4, zarr, and most other array storage technologies in Python. This has allowed other projects (like Dask) to these formats without having to special case them (docs here)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions