-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
At the moment, we host test datasets for the CI on AWI's Nextcloud. I'd like a redundancy, maybe something on a public Helmholtz cloud? We should check with HIFIS Service Catalog and find something appropriate.
Ideally this would be something we could easily swap out:
import fsspec
from fsspec.callbacks import TqdmCallback
import logging
logger = logging.getLogger(__name__)
TAR_LOCATIONS = {
"fesom_2p6_pimesh.tar": [
# [NOTE] PG: I do not like that this has such a non-human name.
"https://nextcloud.awi.de/s/AL2cFQx5xGE473S/download/fesom_2p6_pimesh.tar",
# [FIXME] At least, it would be good to have AWI FTP here:
"ftp://...???",
# [FIXME]: Add Helmholtz cloud storage URL here, this is imaginary:
# "https://data.helmholtz-cloud.de/pycmor/test-data/fesom_2p6_pimesh.tar",
# "s3://helmholtz-bucket/pycmor/test-data/fesom_2p6_pimesh.tar",
# [FIXME]: A DKRZ Location would be motivating, too.
],
# Add other test datasets here
}
def load_tar(tarball_name, local_path=None, show_progress=True):
"""
Download test data tarball with automatic fallback to alternative sources.
Parameters
----------
tarball_name : str
Name of the tarball to download
local_path : str, optional
Local path to save the file
show_progress : bool
Whether to show download progress bar
Returns
-------
str
Path to the downloaded file
Raises
------
RuntimeError
If all download locations fail
"""
if tarball_name not in TAR_LOCATIONS:
raise ValueError(f"Unknown tarball: {tarball_name}")
errors = []
callback = TqdmCallback() if show_progress else None
for location in TAR_LOCATIONS[tarball_name]:
try:
logger.info(f"Attempting to download from: {location}")
fs, path = fsspec.core.url_to_fs(location)
if local_path is None:
local_path = f"/tmp/{tarball_name}"
fs.get(path, local_path, callback=callback)
logger.info(f"Successfully downloaded to: {local_path}")
return local_path
except Exception as e:
logger.warning(f"Failed to download from {location}: {e}")
errors.append((location, str(e)))
continue
# All locations failed
error_msg = "Failed to download from all locations:\n"
for loc, err in errors:
error_msg += f" - {loc}: {err}\n"
raise RuntimeError(error_msg)Metadata
Metadata
Assignees
Labels
No labels