Skip to content

Fix MatFormer tier metadata when saving checkpoints #6

@plugyawn

Description

@plugyawn

Problem

Checkpoint saving currently writes matformer_tier as the effective training tier (effective_tier), not the actual slicing level of the saved weights. When training a full-width checkpoint with --matformer-tier > 0 (runtime slicing), the saved config will set matformer_tier to that tier even though weights are full-width. This makes future auto-detection treat the checkpoint as sliced and can disable runtime slicing or cause incorrect tier handling.

Ref:

  • shared/client/src/state/cooldown.rs (writes matformer_tier from effective_tier)

Expected

matformer_tier should represent the actual slicing of the checkpoint weights. If weights are full-width, this should be 0. If weights are sliced, it should reflect the slice ratio.

Possible Approach

  • When writing config.json, compute actual_tier from matformer_base_intermediate_size and intermediate_size (if ratio is power-of-two), and set matformer_tier to that value.
  • Always store matformer_base_intermediate_size (if not present) to keep detection unambiguous.
  • Fall back to 0 when ratios don’t match.

Acceptance Criteria

  • Full-width checkpoints always save matformer_tier: 0.
  • Sliced checkpoints save the correct tier and base size.
  • Auto-detection works for checkpoints saved during training.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions