Skip to content

Bug and question about dLLM-cache applied to MMaDA #4

@Kaiwen-Zhu

Description

@Kaiwen-Zhu

input_ids, attention_mask = uni_prompting((prompt_text, image_tokens), 't2i_gen')

cache_instance.reset_cache(prompt_length=input_ids.shape[1])

Thanks for your awesome work! However, there seems to be a minor typo in demo_MMada_t2i_cache.py. In the code above, the prompt_length is set to input_ids.shape[1], which is indeed the length of the entire sequence (prompt + response), instead of the prompt length. This will result in fatal performance degradation.

Besides, after fixing it, in my preliminary attempts, the acceleration effect of MMaDA T2I task falls far behind that of language tasks reported in the paper. For example, when I set prompt_interval_steps = 10, gen_interval_steps = 5, transfer_ratio = 0.5, the output image is acceptable and the time reduces from 5.2s to 3.4s. However, when I reduce transfer_ratio to 0.3, the output image becomes messy and the time is 3.0s. Is this expected? What configuration do you suggest for MMaDA T2I task, and what's the corresponding acceleration rate?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions