-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
Checkpoint saving currently writes matformer_tier as the effective training tier (effective_tier), not the actual slicing level of the saved weights. When training a full-width checkpoint with --matformer-tier > 0 (runtime slicing), the saved config will set matformer_tier to that tier even though weights are full-width. This makes future auto-detection treat the checkpoint as sliced and can disable runtime slicing or cause incorrect tier handling.
Ref:
shared/client/src/state/cooldown.rs(writesmatformer_tierfromeffective_tier)
Expected
matformer_tier should represent the actual slicing of the checkpoint weights. If weights are full-width, this should be 0. If weights are sliced, it should reflect the slice ratio.
Possible Approach
- When writing config.json, compute
actual_tierfrommatformer_base_intermediate_sizeandintermediate_size(if ratio is power-of-two), and setmatformer_tierto that value. - Always store
matformer_base_intermediate_size(if not present) to keep detection unambiguous. - Fall back to 0 when ratios don’t match.
Acceptance Criteria
- Full-width checkpoints always save
matformer_tier: 0. - Sliced checkpoints save the correct tier and base size.
- Auto-detection works for checkpoints saved during training.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels