-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
cm_rank and pp_rank is extremely confusing.
Current status is that cm rank is used for cross mapping pipeline rank ordering, while pp rank is the rank in pipeline model parallel group.
However, its querying APIs are used in somewhat mixed manner.
- get_pipeline_model_parallel_rank -> cm rank
- get_pipeline_model_parallel_first/last_rank -> pp rank
- get_pipeline_model_parallel_prev/next_rank -> pp rank
We currently use translate_cm_rank_to_pp_rank to translate the cm rank (usually during communication requests), but this should be refactored since it is suuuuuuper confusing. While doing refactoring, carefully look at various places, including training.py training_log where lm loss is printed by the last cm rank process.
Reactions are currently unavailable