-
Notifications
You must be signed in to change notification settings - Fork 126
Open
Description
Problem:
Merlin now has a bunch of libraries that need to interoperate smoothly, but a general lack of shared abstractions, conventions, and standards that would make that possible.
Goal:
- Build a solid foundation for the Merlin libraries via improvements in Core
New Functionality
- Core:
- Shape in column schemas (for consistent tracking across libraries)
- Cross-framework dtype translation (e.g. via Merlin dtypes)
- Cross-framework data transfer via zero-copy protocols (for Columns and DictArrays -> Series and Dataframes)
- Bespoke Merlin schema file format (i.e. a Protobuf schema for Merlin schema that isn't from Tensorflow Metadata)
- Corresponding updates in all downstream libraries
Constraints:
- All functionality entailed by this issue has to work in and be adoptable by all Merlin libraries
Starting Point:
- Proposals:
- Shapes - https://docs.google.com/document/d/13WrHuCxspTDjv2NPQxWtzZqBoI75Np5_tOopJEWQbBg/edit#
- Dtypes
- Data transfer
- Schema file format
- [ARCH] Fixed vs Ragged #791
- Overarching decisions to be made (Merlin commons ) https://docs.google.com/document/d/1IGCS5-f6-SJkpfHFChE4L_iaNRXVjrx0zW-NPDOyfkU/edit