-
Notifications
You must be signed in to change notification settings - Fork 17
OOM on 4B model training #23
Copy link
Copy link
Open
Description
Hi,
I am getting OOM with 47G or GPU memory when try to train the 4B model. The neat packing is creating sequences of 40960 tokens causing the massive attention matrix that couldn't fit in memory. But even changing 40960 to 2048 gets OOM.
Any suggestions? what GPUs did you use? is there any way I can make it work with 47G memory? (I also tried ZeRO 3)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels