A comprehensive list of papers about Large-Language-Diffusion-Models.
Important
Contributions welcome:
-
If you have a relevant paper not included in the library, please contact us! Or, you may also consider submitting 'Pull requests' directly, thank you!
-
If you think your paper is more suitable for another category, please contact us or submit 'Pull requests'.
-
If your paper is accepted, you may consider updating the relevant information.
-
Thank you!
- 🔥🔥🔥 Awesome-LLDM is now open!
- Gemini Diffusion
- Dream-7B
- DreamOn
- What are Diffusion Language Models?
- Generative Modeling by Estimating Gradients of the Data Distribution
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| Discrete Diffusion in Large Language and Multimodal Models: A Survey | 2025 | Arxiv | |
| Diffusion-based Large Language Models Survey | 2025 | Arxiv | |
| A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models | 2025 | Arxiv |
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| David helps Goliath: Inference-Time Collaboration Between Small Specialized and Large General Diffusion LMs | 2023 | NAACL | |
| Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning | 2023 | Arxiv | |
| TESS 2: A Large-Scale Generalist Diffusion Language Model | 2025 | ACL | Adapted from Mistral-7B-v0.1 |
| Scaling Diffusion Language Models via Adaptation from Autoregressive Models | 2025 | ICLR | 127M~7B (GPT2, LLaMA2) |
| Large Language Diffusion Models | 2025 | Arxiv | LLaDA-8B |
| LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models | 2025 | Arxiv | |
| Large Language Models to Diffusion Finetuning | 2025 | Arxiv | |
| LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs | 2025 | Arxiv | Long context scaling |
| Dream 7B: Diffusion Large Language Models | 2025 | Arxiv | |
| UltraLLaDA: Scaling the Context Length to 128K for Diffusion Large Language Models | 2025 | Arxiv |
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| Scaling Diffusion Language Models via Adaptation from Autoregressive Models | 2025 | ICLR | 127M~7B (GPT2, LLaMA2) |
| SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation | 2025 | Arxiv | |
| From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs | 2025 | Arxiv |
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion | 2025 | Arxiv | |
| dKV-Cache: The Cache for Diffusion Language Models | 2025 | Arxiv | |
| Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding | 2025 | Arxiv | |
| Fast-dLLM v2: Efficient Block-Diffusion LLM | 2025 | Arxiv | |
| d^2Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching | 2025 | Arxiv | |
| Attention Is All You Need for KV Cache in Diffusion LLMs | 2025 | Arxiv |
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| Beyond Autoregression: Fast LLMs via Self-Distillation Through Time | 2025 | ICLR | <7B |
| FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Model | 2025 | Arxiv | |
| CDLM: Consistency Diffusion Language Models For Faster Sampling | 2025 | Arxiv |
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| Attention Sinks in Diffusion Language Models | 2025 | Arxiv | |
| SparseD: Sparse Attention for Diffusion Language Models | 2025 | Arxiv |
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs | 2025 | Arxiv | Quantization |
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control | 2023 | ACL | <7B, Simplex, Blockwise |
| AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation | 2023 | NeurIPS | <7B, AR-like noise |
| Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models | 2025 | ICLR | <7B |
| Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions | 2025 | ICML | <7B |
| Don't Let It Fade: Preserving Edits in Diffusion Language Models via Token Timestep Allocation | 2025 | NeurIPS | <7B |
| Any-Order Flexible Length Masked Diffusion | 2025 | Arxiv |
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| Theoretical Benefit and Limitation of Diffusion Language Model | 2025 | NeurIPS | |
| Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models | 2025 | Arxiv | |
| What Makes Diffusion Language Models Super Data Learners? | 2025 | Arxiv | |
| Why mask diffusion does not work | 2025 | Arxiv | |
| Diffusion Language Models Know the Answer Before Decoding | 2025 | Arxiv |
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| Diffusion Models Beat GANs on Image Synthesis | 2021 | NeurIPS | Image, Classifier Guidance |
| Classifier-Free Diffusion Guidance | 2021 | NeurIPS | Image, Classifier-free Guidance |
| Diffusion-LM Improves Controllable Text Generation | 2022 | NeurIPS | <7B |
| SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control | 2023 | ACL | <7B |
| Constrained Discrete Diffusion | 2025 | NeurIPS | <7B |
| Don't Let It Fade: Preserving Edits in Diffusion Language Models via Token Timestep Allocation | 2025 | NeurIPS | <7B |
| DINGO: Constrained Inference for Diffusion LLMs | 2025 | Arxiv | Constrained decoding |
| CtrlDiff: Boosting Large Diffusion Language Models with Dynamic Block Prediction and Controllable Generation | 2025 | Arxiv |
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| Planning with Diffusion Models for Target-Oriented Dialogue Systems | 2025 | ACL | Dialogue |
| DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation | 2025 | Arxiv | Code Generation |
| Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference | 2025 | Arxiv | Code Generation |
| Beyond Autoregression: An Empirical Study of Diffusion Large Language Models for Code Generation | 2025 | Arxiv | Code Generation |
| Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies | 2025 | Arxiv | VLA |
| LLaDA-VLA: Vision Language Diffusion Action Models | 2025 | Arxiv | VLA |
| dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought | 2025 | Arxiv | VLA |
We welcome all researchers to contribute to this repository.
If you have a related paper that was not added to the library, please contact us.
Email: jake630@snu.ac.kr / wjk9904@snu.ac.kr / qicher@snu.ac.kr