Skip to content

OOM on 4B model training #23

@armanakbari

Description

@armanakbari

Hi,

I am getting OOM with 47G or GPU memory when try to train the 4B model. The neat packing is creating sequences of 40960 tokens causing the massive attention matrix that couldn't fit in memory. But even changing 40960 to 2048 gets OOM.

Any suggestions? what GPUs did you use? is there any way I can make it work with 47G memory? (I also tried ZeRO 3)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions