Skip to content

Dask and Zarr not loading obsp and obsm from remote s3 #951

@will-moore

Description

@will-moore

Hi,

I'm using @ilan-gold's nice sample code at
https://anndata.readthedocs.io/en/latest/tutorials/notebooks/%7Bread,write%7D_dispatched.html
to read remote anndata, which is working great when I'm serving data locally via http.

But when I'm using minio to serve the data, I'm not getting the obsp, obsm or uns parts of the AnnData object. See code sample below.

A UI view of the data is at https://deploy-preview-20--ome-ngff-validator.netlify.app/?source=https://minio-dev.openmicroscopy.org/idr/temp_table/test_segment.zarr/tables/regions_table/
(from ome/ome-ngff-validator#20) where I'm using the extra "keys" in e.g. https://minio-dev.openmicroscopy.org/idr/temp_table/test_segment.zarr/tables/regions_table/obsm/.zattrs to load those groups:

{
    "encoding-type": "dict",
    "encoding-version": "0.1.0",
    "keys": [
        "X_scanorama",
        "X_umap",
        "spatial"
    ]
}

Is there any way I can use those 'keys' to load obsp and obsm data?

Thanks!

from typing import Callable, Union
import dask.array as da
import zarr
from anndata import AnnData
from anndata._io.specs import IOSpec
from anndata.compat import H5Array, H5Group, ZarrArray, ZarrGroup

# ** requires anndata==0.9.0.rc1
from anndata.experimental import read_dispatched, read_elem
from zarr.storage import FSStore
from ome_zarr.io import parse_url
StorageType = Union[H5Array, H5Group, ZarrArray, ZarrGroup]


def read_remote_anndata(store: FSStore, name: str) -> AnnData:
    table_group = zarr.group(store=store, path=name)

    def callback(
        func: Callable, elem_name: str, elem: StorageType, iospec: IOSpec
    ) -> AnnData:
        if iospec.encoding_type in (
            "dataframe",
            "csr_matrix",
            "csc_matrix",
            "awkward-array",
        ):
            # Preventing recursing inside of these types
            return read_elem(elem)
        elif iospec.encoding_type == "array":
            return da.from_zarr(elem)
        else:
            return func(elem)

    adata = read_dispatched(table_group, callback=callback)
    return adata

url = "https://minio-dev.openmicroscopy.org/idr/temp_table/test_segment.zarr/tables/"
name = "regions_table"
store = parse_url(url, mode="r").store
anndata_obj = read_remote_anndata(store, name)

print('anndata_obj', anndata_obj)

# anndata_obj AnnData object with n_obs × n_vars = 1045 × 36
#     obs: 'row_num', 'point', 'cell_id', 'X1', 'center_rowcoord', 'center_colcoord', 'cell_size', 'category', 'donor', 'Cluster', 'batch', 'library_id'
#     var: 'mean-0', 'std-0', 'mean-1', 'std-1', 'mean-2', 'std-2'


url = "http://localhost:8000/test_segment.zarr/tables/"
store = parse_url(url, mode="r").store
anndata_obj = read_remote_anndata(store, name)

print('anndata_obj', anndata_obj)

# anndata_obj AnnData object with n_obs × n_vars = 1045 × 36
#     obs: 'row_num', 'point', 'cell_id', 'X1', 'center_rowcoord', 'center_colcoord', 'cell_size', 'category', 'donor', 'Cluster', 'batch', 'library_id'
#     var: 'mean-0', 'std-0', 'mean-1', 'std-1', 'mean-2', 'std-2'
#     uns: 'Cluster_colors', 'batch_colors', 'neighbors', 'spatial', 'umap'
#     obsm: 'X_scanorama', 'X_umap', 'spatial'
#     obsp: 'connectivities', 'distances'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions