-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hi Will,
I used cloudvolume to upload a simple greyscale image volume in precomputed format to google cloud, as I've done a million times. The upload seemed to succeed without issue. But if I try to download the data from google cloud using cloudvolume, I get a scary error:
In [1]: from cloudvolume import CloudVolume
/home/phelps/.virtualenvs/cloudvolume/lib/python3.10/site-packages/python_jsonschema_objects/__init__.py:113: UserWarning: Schema id not specified. Defaulting to 'self'
warnings.warn("Schema id not specified. Defaulting to 'self'")
In [2]: vol = CloudVolume('gs://lee-lab_brain-and-nerve-cord-fly-connectome/templates/JRC2018_FEMALE.ng')
Using default Google credentials. There is no ~/.cloudvolume/secrets/google-secret.json set.
In [3]: im = vol[:]
Downloading: 0%| | 0/2496 [00:01<?, ?it/s]
---------------------------------------------------------------------------
MD5IntegrityError Traceback (most recent call last)
Cell In[3], line 1
----> 1 im = vol[:]
File ~/.virtualenvs/cloudvolume/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:551, in CloudVolumePrecomputed.__getitem__(self, slices)
548 channel_slice = slices.pop()
549 requested_bbox = Bbox.from_slices(slices)
--> 551 img = self.download(requested_bbox, self.mip)
552 return img[::steps.x, ::steps.y, ::steps.z, channel_slice]
......
File ~/.virtualenvs/cloudvolume/lib/python3.10/site-packages/cloudfiles/cloudfiles.py:423, in CloudFiles.get.<locals>.download(path)
421 if start is None and end is None:
422 if server_hash_type == "md5":
--> 423 check_md5(path, content, server_hash)
424 elif server_hash_type == "crc32c":
425 check_crc32c(path, content, server_hash)
File ~/.virtualenvs/cloudvolume/lib/python3.10/site-packages/cloudfiles/cloudfiles.py:393, in CloudFiles.get.<locals>.check_md5(path, content, server_hash)
390 computed_md5 = md5(content)
392 if computed_md5.rstrip("==") != server_hash.rstrip("=="):
--> 393 raise MD5IntegrityError("{} failed its md5 check. server md5: {} computed md5: {}".format(
394 path, server_hash, computed_md5
395 ))
MD5IntegrityError: 380_380_380/0-64_0-64_0-64 failed its md5 check. server md5: I2RXOQeR8uEbpP1FfLfFPA== computed md5: tMYajCH5Z0AfeDH2Y+Q+ig==I've never seen this before. I tried re-uploading the dataset and got the same problem, so I don't think it was a failed upload / corrupted data. The dataset also loads into neuroglancer just fine. I can also download the files using a gcloud storage cp command just fine. So I suspect that the issue may not actually be with the files but with how cloudfiles is attempting to validate the checksum. Not sure if its relevant, but the specific cube that triggers the error is in fact all black (pixel values all 0) and is the top-left-most block in the dataset.
Do you have any idea what could be going on here? Can you reproduce the issue if you try to load this exact volume into memory via cloudvolume?
Thanks a lot!