Qwen model import error

您好，感谢您百忙之中抽空阅读！我可能遇到了模型加载失败的错误  
我在训练自己的基座模型时进行了如下的设置  

### model  
model_name_or_path: /train34/cog8/permanent/mywang71/Qwen2.5-1.5B  

### method  
stage: ddm  
shift: true  
do_train: true  
finetuning_type: full  
deepspeed: examples/deepspeed/ds_z2_config.json  

### dataset  
dataset: slimpajama  
template: qwen  
packing: true  
cutoff_len: 2048  
streaming: false  
tokenized_path: output/qwen2-slimpajama-tokenized/  
overwrite_cache: true  
preprocessing_num_workers: 16  


**然后启动训练FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_full_ddm.yaml但遇到了如下报错**  

06/03/2025 10:13:37 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.  
06/03/2025 10:13:37 - INFO - llamafactory.model.adapter - Fine-tuning method: Full  
Traceback (most recent call last):  
  File "/train34/cog8/permanent/mywang71/diffusion/DiffuLLaMA-main/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in <module>
    launch()  
  File "/train34/cog8/permanent/mywang71/diffusion/DiffuLLaMA-main/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
    run_exp()  
  File "/train34/cog8/permanent/mywang71/diffusion/DiffuLLaMA-main/LLaMA-Factory/src/llamafactory/train/tuner.py", line 53, in run_exp
    run_ddm(model_args, data_args, training_args, finetuning_args, callbacks)  
  File "/train34/cog8/permanent/mywang71/diffusion/DiffuLLaMA-main/LLaMA-Factory/src/llamafactory/train/ddm/workflow.py", line 40, in run_ddm  
    model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)  
  File "/train34/cog8/permanent/mywang71/diffusion/DiffuLLA-MA-main/LLaMA-Factory/src/llamafactory/model/loader.py", line 279, in load_model  
    **trainable_params, all_param, 100 * trainable_params / all_param  
                                                 ~~~~~~~~~~~~~~~~~~~~~~^  
ZeroDivisionError: division by zero**  
[2025-06-03 10:13:40,092] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 25298) of binary: /home4/intern/mywang71/miniconda3/envs/DiffuLLAMA/bin/python3.1  


我注意到LLaMA-Factory/src/llamafactory/train/ddm/model.py中DiscreteDiffusionModel类中缺少了qwen2的分支，目前只有gpt2和llama。请问这个问题的根源吗？（我尝试自己添加qwen2分支，不会再报同样的错误，但不知道是否已经从根源解决问题）           
 if getattr(self.config, "model_type", None) == "gpt2":  
            self.embed_tokens = self.model.transformer.wte  
            self.denoise_model = self.model.transformer # use inputs_embeds instead of input_ids in forward function  
            for gpt2block in self.model.transformer.h:  
                gpt2block.attn.bias.fill_(True)  # remove causal mask  
            self.lm_head = self.model.lm_head  
            del self.denoise_model.wte   
        elif getattr(self.config, "model_type", None) == "llama":  
            self.embed_tokens = self.model.model.embed_tokens  
            self.denoise_model = self.model.model  
            self.lm_head = self.model.lm_head  
            del self.denoise_model.embed_tokens  
        del self.model  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen model import error #20

model

method

dataset

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen model import error #20

Description

model

method

dataset

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions