Skip to content

Question when reproducing the experiment in the paper #6

@Kevinstone-199898

Description

@Kevinstone-199898

Hi, thanks for your sharing of the code. I have a question when I try to reproduce the experiment in the paper or more specifically, this figure.
Image
I followed the code in this repo and set the parameters of 1B models according to the paper:
Image
I also set the global batch size the same with the paper which is 512.
I use 8 H800 with 80GB. The Validation loss during training I got is as follow:
Image
which seems very different from the figure in the paper. And I wonder whether I am doing something wrong or why is this. I hope to get your help. thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions