Skip to content

_match_attrs to handle matching attributes that are named differently in datasets under comparison #378

@Timh37

Description

@Timh37

_match_attrs can currently only handle comparing a list of attributes among two datasets. However, for dedrifting experiments based on preindustrial control simulations, the attribute to be matched from the experiment dataset is 'parent_variant_label', which needs to correspond with the 'variant_label' of the preIndustrial contol run. Would be nice to have a function for this to allow the user to match dataset dictionaries of experiments to be dedrifted with a dataset dictionary of preindustrial control runs.

Something like:

def _match_twosided_attrs(ds_a, ds_b, attrs_a, attrs_b):
    """returns the number of matched attrs between two datasets"""
    if len(attrs_a)!=len(attrs_b):
        raise Exception('lists of attributes in each dataset must be of equal length.')
        
    try:
        n_match = sum([ds_a.attrs[attrs_a[i]] == ds_b.attrs[attrs_b[i]] for i in range(len(attrs_a))])
        return n_match
    except KeyError:
        raise ValueError(
            f"Cannot match datasets because at least one of the datasets does not contain all attributes [{attrs_a} and {attrs_b}]."
        )

or alternatively an argument indicating parent is being compared with child, which automatically changes parent_variant_label to variant_label (and similar attributes) using this existing line? ds.attrs[f"parent_{ma}"] not in reference.attrs[ma]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions