-
Notifications
You must be signed in to change notification settings - Fork 14
Bug and question about dLLM-cache applied to MMaDA #4
Description
dLLM-cache/demo_MMada_t2i_cache.py
Line 168 in c943fc2
| input_ids, attention_mask = uni_prompting((prompt_text, image_tokens), 't2i_gen') |
dLLM-cache/demo_MMada_t2i_cache.py
Line 194 in c943fc2
| cache_instance.reset_cache(prompt_length=input_ids.shape[1]) |
Thanks for your awesome work! However, there seems to be a minor typo in demo_MMada_t2i_cache.py. In the code above, the prompt_length is set to input_ids.shape[1], which is indeed the length of the entire sequence (prompt + response), instead of the prompt length. This will result in fatal performance degradation.
Besides, after fixing it, in my preliminary attempts, the acceleration effect of MMaDA T2I task falls far behind that of language tasks reported in the paper. For example, when I set prompt_interval_steps = 10, gen_interval_steps = 5, transfer_ratio = 0.5, the output image is acceptable and the time reduces from 5.2s to 3.4s. However, when I reduce transfer_ratio to 0.3, the output image becomes messy and the time is 3.0s. Is this expected? What configuration do you suggest for MMaDA T2I task, and what's the corresponding acceleration rate?