Skip to content

test_gpt2作业中遇到的一些问题 #1

@zhouyibetter

Description

@zhouyibetter
// 测试类
class GPT2TrainingTest : public ::testing::Test {
protected:
    void SetUp() override {
        llmc_filepath = "../../Data/gpt2_124M.bin";
        input_bin = "../../Data/tinyshakespeare/tiny_shakespeare_train.bin";
        tokenizer_bin = "../../Data/gpt2_tokenizer.bin";
        logits_reference = "../../Data/gpt2_logits_reference.bin";

        device_flag = "cuda";
        model_name = "gpt2";
        batch_size = 2;
        sequence_length = 64;
        total_batch_size = 256;
        num_iteration = 10;    // 迭代次数
        text_length = 64;    // 生成文本长度
        learning_rate = 1e-4;    //学习率

        Initialize();

        LOG(INFO)<< "Initialize() finished!";
    }

这里提到的logits_reference = "../../Data/gpt2_logits_reference.bin";找不到,download_starter_pack.sh中也没有找到适合的连接,难道是tiny_shakespeare_val.bin吗?

其次,在

        /* tokenizer */
        if ((step + 1) % freq_generate_txt == 0) {
            if (!tokenizer) {
                continue;
            }
            tokenizer->GenerateText(*model, batch_size, sequence_length, text_length, device);
        }

文本生成的长度过长会导致爆内存的问题,报错信息如下:

Breakpoint 6 at 0x5555556817ac: file /home/eq/TinyInfiniTrain/example/common/tokenizer.cc, line 170.
(gdb) c
Continuing.
The meaning of life is that deriving pleasure from things being desired: if two people voluntarily premise that an ideal related to an ideal is desirable, the ideal relates to the human enviroment.)<|endoftext|>Quickunknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.

[  FAILED  ] GPT2TrainingTest.LogitsConsistency (555723 ms)
[----------] 1 test from GPT2TrainingTest (555723 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (555723 ms total)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions