-
Notifications
You must be signed in to change notification settings - Fork 126
Open
Milestone
Description
Problem:
As a developer. to satisfy the requirements for both:
- [RMP] Create a standard set of cross-framework evaluation metrics models#450
- [RMP] Support Offline Batch processing of Recs Generation Pipelines #419
I would like to be able to convert data between frameworks without having to worry about how to transfer from one underlying framework to another. It should be almost seamless, and chainable. I would like to be able to turn data from more framework to another and then possibly to another. I would like to do this in the most efficient (speed/memory) way possible.
Goal:
- Create a class that developers can leverage to easily transfer data between frameworks.
- Users should be able to concatenate columns of data and transform to a target framework.
- User should not have to know anything about the underlying framework and how the data needs to be moved from one framework to another (i.e. dlpack, cuda array interface, numpy array interface).
- User should be able to transform data from current framework to target framework in one function call.
- Transforms should be zero copy or minimal copy when possible.
Constraints:
- Must support dynamic environments, Not all libraries are guaranteed to be present at all times.
- Available libraries should be automatically available
- Unavailable library interfaces should inform the user the target packages are not installed on use.
- Must be easily extensible to leverage new data types as they become supported.
- Must be useable in merlin systems(operator input transformations) and merlin models (replace FeatureCollection).
Starting Point:
- - Create a base class to house data at the column level
- - Create subclasses of the base class to support each framework
- - Classes should self register, be available if the target framework is imported successfully.
- - Create a class to support multiple columns of different subclasses (i.e. concat prediction from multiple models, xgboost, tensorflow, pytorch)
It is critical that this ticket not only be created, but also kept up to date. As you work constraints are going to be discovered and should be added to the above list. Tasks required to complete this project may change. The goal of the work may even change. Without a commitment to keeping this ticket up to date the work shouldn't be undertaken.
Metadata
Metadata
Assignees
Labels
No labels