Skip to content

Summarization functions #194

@emrysshevek

Description

@emrysshevek

Part of the goal of this package is to encode a dataset as a one dimensional vector with a consistent size. To do that, we use the profile_distribution function on any metafeatures that return a sequence of values (e.g. means of numeric features) in order to flatten it to a consistent shape.

Currently, profile_distribution has a rigid set of summarization functions it computes every time no matter what. It would be nice to refactor this into a more flexible summarization function that allows only subset of summary measures to be computed, or possibly to have custom summary functions passed in.

This would possibly include rethinking the naming scheme for our metafeatures and the structure of the computation in order to allow an arbitrary number of summaries to be computed on a given metafeature. This could follow more closely with our current method of including the summary as a prefix to the metafeature (e.g. MeanMeansOfNumericFeatures, SumMeansOfNumericFeatures) or we could move closer to the D3M way of including the summary as an extension (e.g. MeansOfNumericFeatures.mean). The second way could also more naturally allow several chained operations to be clearly indicated (e.g. NumericFeatures.entropy.mean).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions