Skip to content

Qwen model import error #20

@mywang44

Description

@mywang44

您好,感谢您百忙之中抽空阅读!我可能遇到了模型加载失败的错误
我在训练自己的基座模型时进行了如下的设置

model

model_name_or_path: /train34/cog8/permanent/mywang71/Qwen2.5-1.5B

method

stage: ddm
shift: true
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z2_config.json

dataset

dataset: slimpajama
template: qwen
packing: true
cutoff_len: 2048
streaming: false
tokenized_path: output/qwen2-slimpajama-tokenized/
overwrite_cache: true
preprocessing_num_workers: 16

然后启动训练FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_full_ddm.yaml但遇到了如下报错

06/03/2025 10:13:37 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/03/2025 10:13:37 - INFO - llamafactory.model.adapter - Fine-tuning method: Full
Traceback (most recent call last):
File "/train34/cog8/permanent/mywang71/diffusion/DiffuLLaMA-main/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in
launch()
File "/train34/cog8/permanent/mywang71/diffusion/DiffuLLaMA-main/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
run_exp()
File "/train34/cog8/permanent/mywang71/diffusion/DiffuLLaMA-main/LLaMA-Factory/src/llamafactory/train/tuner.py", line 53, in run_exp
run_ddm(model_args, data_args, training_args, finetuning_args, callbacks)
File "/train34/cog8/permanent/mywang71/diffusion/DiffuLLaMA-main/LLaMA-Factory/src/llamafactory/train/ddm/workflow.py", line 40, in run_ddm
model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
File "/train34/cog8/permanent/mywang71/diffusion/DiffuLLA-MA-main/LLaMA-Factory/src/llamafactory/model/loader.py", line 279, in load_model
trainable_params, all_param, 100 * trainable_params / all_param
~~~~~~~~~~~~~~~~~~~~~~^
ZeroDivisionError: division by zero

[2025-06-03 10:13:40,092] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 25298) of binary: /home4/intern/mywang71/miniconda3/envs/DiffuLLAMA/bin/python3.1

我注意到LLaMA-Factory/src/llamafactory/train/ddm/model.py中DiscreteDiffusionModel类中缺少了qwen2的分支,目前只有gpt2和llama。请问这个问题的根源吗?(我尝试自己添加qwen2分支,不会再报同样的错误,但不知道是否已经从根源解决问题)


if getattr(self.config, "model_type", None) == "gpt2":
self.embed_tokens = self.model.transformer.wte
self.denoise_model = self.model.transformer # use inputs_embeds instead of input_ids in forward function
for gpt2block in self.model.transformer.h:
gpt2block.attn.bias.fill_(True) # remove causal mask
self.lm_head = self.model.lm_head
del self.denoise_model.wte
elif getattr(self.config, "model_type", None) == "llama":
self.embed_tokens = self.model.model.embed_tokens
self.denoise_model = self.model.model
self.lm_head = self.model.lm_head
del self.denoise_model.embed_tokens
del self.model

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions