Skip to content

Auto-load tiered checkpoints for MatFormer tiers (remote + local) #3

@plugyawn

Description

@plugyawn

Problem

--matformer-load-strategy auto only checks a local -tier{N} path. If the checkpoint is remote (HF), it downloads the universal repo and does not attempt a tiered repo. Small-tier nodes can OOM unless operators manually provide tiered checkpoints.

Refs:

  • shared/client/src/state/init.rs (local-only tier path detection)

Expected

Auto strategy should prefer tiered checkpoints when available (local or remote), or offer an explicit override to point to tiered repos.

Possible Approach

  • Add --matformer-tier-repo or template (e.g., {repo}-tier{tier}) and try that before universal.
  • For auto, attempt remote repo_id-tier{tier} if local tier dir is missing.
  • Optional fallback: download universal and slice locally (with clear warning), if enabled.

Acceptance Criteria

  • Tiered checkpoints are automatically preferred when available.
  • Clear behavior when tiered repos are missing (fallback or error).
  • Documentation update for recommended tiered repo layout.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions