Skip to content

Add default way to specify upstream data for this node #69

@kzecchini

Description

@kzecchini

Right now custom nodes need to apply certain logic to find upstream data - sometimes filtering on keys. However upstream operations might cause the keys to be different names, or in a different format.

I think that there may be a way to implement a standard way to find upstream data in the AbstractNode class. Every node should be able to take a standard configuration which will search for upstream data for this custom node. We can specify the potential "data args" to a node in this way.

For example if I am looking for data_1, but it is keyed to my_upstream_key_1, we can have a configuration which fixes this mapping for us. Example:

...
class: MyCustomNode
upstream_data:
  filter_for_key: my_filter
  data_1_key: my_upstream_key_1
  data_2_key: my_upstream_key_2
...

Our documentation for each class can include the data which is needed in the data_object, for example:

data_1 (pd.DataFrame): dataframe of training data
data_2 (int): number of cv folds
...

In this way we can ensure that when we are searching upstream, we can always find the data by including an optional remapping.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions