Skip to content

workloads/dataset-controller: Out-of-memory on large datasets #122

@MostAwesomeDude

Description

@MostAwesomeDude

We currently use HF Datasets to load datasets from HF Hub. Their recommended method requires the entire dataset to fit in memory. If not, then our dataset controller will likely run out of memory and crash.

This hasn't been observed yet, but is considered inevitable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions