[论文讨论] Multi-Token Prediction via Self-Distillation

## 论文信息

**标题**: [Multi-Token Prediction via Self-Distillation](https://arxiv.org/abs/2602.06019v1)
**作者**: John Kirchenbauer, Abhimanyu Hans, Brian Bartoldson, Micah Goldblum, Ashwinee Panda 等 6 位作者
**发布时间**: 2026-02-05
**分类**: cs.CL
**PDF**: [Download](https://arxiv.org/pdf/2602.06019v1.pdf)

## 简介

Existing techniques for accelerating language model inference, such as speculative decoding, require training auxiliary speculator models and building and deploying complex inference pipelines. We consider a new approach for converting a pretrained autoregressive language model from a slow single next token prediction model into a fast standalone multi-token prediction model using a simple online distillation objective. The final model retains the exact same implementation as the pretrained initial checkpoint and is deployable without the addition of any auxiliary verifier or other specialized inference code. On GSM8K, our method produces models that can decode more than $3\times$ faster on average at $<5\%$ drop in accuracy relative to single token decoding performance.

## 推荐理由

论文2推荐讨论：'standalone多token预测'的实现方式简洁有效，3倍加速但精度损失<5%的权衡值得评估其实际应用价值

## 讨论

请对这篇论文发表您的见解：
- 论文的创新点是什么？
- 方法是否合理？
- 实验结果是否可信？
- 有哪些可以改进的地方？

---
_由 arXiv Monitor 自动创建_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[论文讨论] Multi-Token Prediction via Self-Distillation #64

论文信息

简介

推荐理由

讨论

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[论文讨论] Multi-Token Prediction via Self-Distillation #64

Description

论文信息

简介

推荐理由

讨论

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions