Skip to content

Incorrect (I think?) MD5IntegrityError #105

@jasper-tms

Description

@jasper-tms

Hi Will,

I used cloudvolume to upload a simple greyscale image volume in precomputed format to google cloud, as I've done a million times. The upload seemed to succeed without issue. But if I try to download the data from google cloud using cloudvolume, I get a scary error:

In [1]: from cloudvolume import CloudVolume
/home/phelps/.virtualenvs/cloudvolume/lib/python3.10/site-packages/python_jsonschema_objects/__init__.py:113: UserWarning: Schema id not specified. Defaulting to 'self'
  warnings.warn("Schema id not specified. Defaulting to 'self'")

In [2]: vol = CloudVolume('gs://lee-lab_brain-and-nerve-cord-fly-connectome/templates/JRC2018_FEMALE.ng')
Using default Google credentials. There is no ~/.cloudvolume/secrets/google-secret.json set.

In [3]: im = vol[:]
Downloading:   0%|                                                                                                                | 0/2496 [00:01<?, ?it/s]
---------------------------------------------------------------------------
MD5IntegrityError                         Traceback (most recent call last)
Cell In[3], line 1
----> 1 im = vol[:]

File ~/.virtualenvs/cloudvolume/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:551, in CloudVolumePrecomputed.__getitem__(self, slices)
    548 channel_slice = slices.pop()
    549 requested_bbox = Bbox.from_slices(slices)
--> 551 img = self.download(requested_bbox, self.mip)
    552 return img[::steps.x, ::steps.y, ::steps.z, channel_slice]



......



File ~/.virtualenvs/cloudvolume/lib/python3.10/site-packages/cloudfiles/cloudfiles.py:423, in CloudFiles.get.<locals>.download(path)
    421 if start is None and end is None:
    422   if server_hash_type == "md5":
--> 423     check_md5(path, content, server_hash)
    424   elif server_hash_type == "crc32c":
    425     check_crc32c(path, content, server_hash)

File ~/.virtualenvs/cloudvolume/lib/python3.10/site-packages/cloudfiles/cloudfiles.py:393, in CloudFiles.get.<locals>.check_md5(path, content, server_hash)
    390 computed_md5 = md5(content)
    392 if computed_md5.rstrip("==") != server_hash.rstrip("=="):
--> 393   raise MD5IntegrityError("{} failed its md5 check. server md5: {} computed md5: {}".format(
    394     path, server_hash, computed_md5
    395   ))

MD5IntegrityError: 380_380_380/0-64_0-64_0-64 failed its md5 check. server md5: I2RXOQeR8uEbpP1FfLfFPA== computed md5: tMYajCH5Z0AfeDH2Y+Q+ig==

I've never seen this before. I tried re-uploading the dataset and got the same problem, so I don't think it was a failed upload / corrupted data. The dataset also loads into neuroglancer just fine. I can also download the files using a gcloud storage cp command just fine. So I suspect that the issue may not actually be with the files but with how cloudfiles is attempting to validate the checksum. Not sure if its relevant, but the specific cube that triggers the error is in fact all black (pixel values all 0) and is the top-left-most block in the dataset.

Do you have any idea what could be going on here? Can you reproduce the issue if you try to load this exact volume into memory via cloudvolume?

Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions