-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
Description
Summary
Introduce support for globally unique dataset identification via a content-derived ID.
- Define a deterministic content-based ID for datasets (e.g., hash of dataset manifest or slice list).
- Implement a way to resolve dataset names into these IDs.
- Ensure the training and scheduling flow can work entirely off this content ID to enable reproducibility and deduplication.
Background
Right now, dataset names serve as identifiers, but they are not content-stable or unique. With content-addressed slices now in place, we should extend this to the dataset level for stronger integrity, reproducibility, and cacheability guarantees.
Reactions are currently unavailable