-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Hi! I have noticed the information in the paper:
We train the 160M model on 1B-token datasets using a single NVIDIA A100
40GB GPU. For experiments with the 160M, 470M, and 1B models on 10B and 50B-token datasets,
we utilize 8 NVIDIA A100 40GB GPUs. All data scoring steps, including proxy data annotation,
scorer training, and data scoring, are performed on a single NVIDIA A100 80GB GPU.
In the step"Annotate proxy data", the PMP-Solver is trained, and I wonder how long will it take in this step for an H800 or A100 80GB GPU? Can I speed up this step using multi-gpu parallelism in deepspeed framework? Thanks!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels