Skip to content

Merlin Data Conversion Support #420

@jperez999

Description

@jperez999

Problem:

As a developer. to satisfy the requirements for both:

Goal:

  • Create a class that developers can leverage to easily transfer data between frameworks.
  • Users should be able to concatenate columns of data and transform to a target framework.
  • User should not have to know anything about the underlying framework and how the data needs to be moved from one framework to another (i.e. dlpack, cuda array interface, numpy array interface).
  • User should be able to transform data from current framework to target framework in one function call.
  • Transforms should be zero copy or minimal copy when possible.

Constraints:

  • Must support dynamic environments, Not all libraries are guaranteed to be present at all times.
  • Available libraries should be automatically available
  • Unavailable library interfaces should inform the user the target packages are not installed on use.
  • Must be easily extensible to leverage new data types as they become supported.
  • Must be useable in merlin systems(operator input transformations) and merlin models (replace FeatureCollection).

Starting Point:

  • - Create a base class to house data at the column level
  • - Create subclasses of the base class to support each framework
  • - Classes should self register, be available if the target framework is imported successfully.
  • - Create a class to support multiple columns of different subclasses (i.e. concat prediction from multiple models, xgboost, tensorflow, pytorch)

It is critical that this ticket not only be created, but also kept up to date. As you work constraints are going to be discovered and should be added to the above list. Tasks required to complete this project may change. The goal of the work may even change. Without a commitment to keeping this ticket up to date the work shouldn't be undertaken.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions