Question when testing

Great work! However, when I tried to reproduce the results using the script:
`bash eval_scripts/run_LLaDA_gsm8k_base.sh`, 
for the following part:
`accelerate launch --config_file accelerate_config.yaml evaluation_script.py -m lm_eval \
--model LLaDA --tasks gsm8k --batch_size 2 \
--model_args "pretrained=models/LLaDA-8B-Base,prompt_interval_steps=-1,gen_interval_steps=-1,transfer_ratio=0.0,cache_order=0,is_feature_cache=False,is_cfg_cache=False" \
--gen_kwargs "block_length=256,gen_length=256,steps=256,cfg_scale=0 "  \
--num_fewshot 4  \
--output_path ./logs/gsm8k_log \
--log_samples \`
the following error occurs:
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/user/new/luoxingyu/dLLM-cache/evaluation_script.py", line 10, in <module>
[rank0]:     cli_evaluate()
[rank0]:   File "/home/user/data/luoxingyu/miniconda3/envs/dllm_cache/lib/python3.12/site-packages/lm_eval/__main__.py", line 474, in cli_evaluate
[rank0]:     results = evaluator.simple_evaluate(
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/data/luoxingyu/miniconda3/envs/dllm_cache/lib/python3.12/site-packages/lm_eval/utils.py", line 458, in _wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/data/luoxingyu/miniconda3/envs/dllm_cache/lib/python3.12/site-packages/lm_eval/evaluator.py", line 357, in simple_evaluate
[rank0]:     results = evaluate(
[rank0]:               ^^^^^^^^^
[rank0]:   File "/home/user/data/luoxingyu/miniconda3/envs/dllm_cache/lib/python3.12/site-packages/lm_eval/utils.py", line 458, in _wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/data/luoxingyu/miniconda3/envs/dllm_cache/lib/python3.12/site-packages/lm_eval/evaluator.py", line 588, in evaluate
[rank0]:     resps = getattr(lm, reqtype)(cloned_reqs)
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/new/luoxingyu/dLLM-cache/eval_model/LLaDA.py", line 782, in generate_until
[rank0]:     out = generate(
[rank0]:           ^^^^^^^^^
[rank0]:   File "/home/user/new/luoxingyu/dLLM-cache/utils/generate_function.py", line 121, in generate
[rank0]:     res = model(x, attention_mask=attention_mask)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/data/luoxingyu/miniconda3/envs/dllm_cache/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/data/luoxingyu/miniconda3/envs/dllm_cache/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/.cache/huggingface/modules/transformers_modules/LLaDA_hyphen_8B_hyphen_Base/modeling_llada.py", line 1432, in forward
[rank0]:     outputs = self.model.forward(
[rank0]:               ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/.cache/huggingface/modules/transformers_modules/LLaDA_hyphen_8B_hyphen_Base/modeling_llada.py", line 1329, in forward
[rank0]:     x, cache = block(x, attention_bias=attention_bias, layer_past=layer_past, use_cache=use_cache)
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/data/luoxingyu/miniconda3/envs/dllm_cache/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/data/luoxingyu/miniconda3/envs/dllm_cache/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/.cache/huggingface/modules/transformers_modules/LLaDA_hyphen_8B_hyphen_Base/modeling_llada.py", line 911, in forward
[rank0]:     att, cache = self.attention(q, k, v, attention_bias, layer_past=layer_past, use_cache=use_cache)
[rank0]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/.cache/huggingface/modules/transformers_modules/LLaDA_hyphen_8B_hyphen_Base/modeling_llada.py", line 711, in attention
[rank0]:     att = self._scaled_dot_product_attention(
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/user/.cache/huggingface/modules/transformers_modules/LLaDA_hyphen_8B_hyphen_Base/modeling_llada.py", line 653, in _scaled_dot_product_attention
[rank0]:     return F.scaled_dot_product_attention(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: RuntimeError: The size of tensor a (1212) must match the size of tensor b (956) at non-singleton dimension 3

However, the tests for the remaining two modified versions can be completed normally. What might be the cause of this issue?
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question when testing #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question when testing #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions