Skip to content

Wrong checkpoints loaded when multiple projects exists in the artifact_library #6

@a-akram

Description

@a-akram

I have found strange behaviour of TrainTrack when we resume training from a checkpoint. Let's say we have two projects: GNNStudy and DNNStudy and we ran our pipeline once for each project. So we will have GNNStudy/version_0 and DNNStudy/version_0 for checkpoints in artifact_library: lightning_models/lightning_checkpoints.

If I resume my training for GNNStudy with resume_id: version_0 then TrainTrack sometimes jumps to DNNStudy/version_0 rather than GNNStudy/version_0. Seems like laod_config() uses os.walk from artifact_library as root and it finds version_0 that it encounters first. Maybe one should add a path like this artifact_library/project to search for a specific run where project: GNNStudy/DNNStudy comes from model_config.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions