Hi, thank you for releasing dParallel! The idea of learnable parallel decoding for diffusion LLMs is very exciting!
I have one technical question:
Since the paper does not report FLOPs, I’m trying to estimate the computational overhead of dParallel compared to the baseline diffusion LLM decoding.
May I ask:
1. How should FLOPs be counted for dParallel?
2. If possible, is there any script / config you used to profile FLOPs?
I’d like to reproduce similar measurements.
Thanks again for the great work. Looking forward to your guidance!