fix: Fix GPT-NeoX example copy collision during container startup inspired by Liam#6668
fix: Fix GPT-NeoX example copy collision during container startup inspired by Liam#6668daranday wants to merge 1 commit intogh/daranday/3/basefrom
Conversation
…pired by Liam As Liam investigated, there is a copy collision with the GPT-NeoX demo currently. This change mitigates that. We might still need to convert to a shared_fs approach in the future to take advantage of larger scale data. [ghstack-poisoned]
|
LGTM for now since merging to your own fork. We'll also need to update the docker image for all experiment configs once my PR to update the gpt-neox image is merged to the environments repo. In particular, we might break still when not using shared fs for multinode dtrain if the logic to build index is not updated such that we build index on local rank 0 when not using shared fs: |
|
Hey @liamcli, thanks for the review! About code. It looks like it's using local index now on the gp2_dataset.py, so we should be good now? About PR format. This is actually targeted for the main branch. The weird base branch (gh/daranday/3/head) is just the Sapling's way of creating code reviews. When landing using |
Stack from ghstack (oldest at bottom):
As Liam investigated, there is a copy collision with the GPT-NeoX demo currently. This change mitigates that. We might still need to convert to a shared_fs approach in the future to take advantage of larger scale data.