Skip to content

Redundant Data Server for CI #236

@pgierz

Description

@pgierz

At the moment, we host test datasets for the CI on AWI's Nextcloud. I'd like a redundancy, maybe something on a public Helmholtz cloud? We should check with HIFIS Service Catalog and find something appropriate.

Ideally this would be something we could easily swap out:

import fsspec
from fsspec.callbacks import TqdmCallback
import logging

logger = logging.getLogger(__name__)

TAR_LOCATIONS = {
    "fesom_2p6_pimesh.tar": [
        # [NOTE] PG: I do not like that this has such a non-human name.
        "https://nextcloud.awi.de/s/AL2cFQx5xGE473S/download/fesom_2p6_pimesh.tar",
        # [FIXME] At least, it would be good to have AWI FTP here:
        "ftp://...???",
        # [FIXME]: Add Helmholtz cloud storage URL here, this is imaginary:
        # "https://data.helmholtz-cloud.de/pycmor/test-data/fesom_2p6_pimesh.tar",
        # "s3://helmholtz-bucket/pycmor/test-data/fesom_2p6_pimesh.tar",
        # [FIXME]: A DKRZ Location would be motivating, too.
    ],
    # Add other test datasets here
}

def load_tar(tarball_name, local_path=None, show_progress=True):
    """
    Download test data tarball with automatic fallback to alternative sources.
    
    Parameters
    ----------
    tarball_name : str
        Name of the tarball to download
    local_path : str, optional
        Local path to save the file
    show_progress : bool
        Whether to show download progress bar
        
    Returns
    -------
    str
        Path to the downloaded file
        
    Raises
    ------
    RuntimeError
        If all download locations fail
    """
    if tarball_name not in TAR_LOCATIONS:
        raise ValueError(f"Unknown tarball: {tarball_name}")
    
    errors = []
    callback = TqdmCallback() if show_progress else None
    
    for location in TAR_LOCATIONS[tarball_name]:
        try:
            logger.info(f"Attempting to download from: {location}")
            fs, path = fsspec.core.url_to_fs(location)
            
            if local_path is None:
                local_path = f"/tmp/{tarball_name}"
                
            fs.get(path, local_path, callback=callback)
            logger.info(f"Successfully downloaded to: {local_path}")
            return local_path
            
        except Exception as e:
            logger.warning(f"Failed to download from {location}: {e}")
            errors.append((location, str(e)))
            continue
    
    # All locations failed
    error_msg = "Failed to download from all locations:\n"
    for loc, err in errors:
        error_msg += f"  - {loc}: {err}\n"
    raise RuntimeError(error_msg)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions