-
Notifications
You must be signed in to change notification settings - Fork 7
Description
论文信息
标题: Multi-Token Prediction via Self-Distillation
作者: John Kirchenbauer, Abhimanyu Hans, Brian Bartoldson, Micah Goldblum, Ashwinee Panda 等 6 位作者
发布时间: 2026-02-05
分类: cs.CL
PDF: Download
简介
Existing techniques for accelerating language model inference, such as speculative decoding, require training auxiliary speculator models and building and deploying complex inference pipelines. We consider a new approach for converting a pretrained autoregressive language model from a slow single next token prediction model into a fast standalone multi-token prediction model using a simple online distillation objective. The final model retains the exact same implementation as the pretrained initial checkpoint and is deployable without the addition of any auxiliary verifier or other specialized inference code. On GSM8K, our method produces models that can decode more than
推荐理由
论文2推荐讨论:'standalone多token预测'的实现方式简洁有效,3倍加速但精度损失<5%的权衡值得评估其实际应用价值
讨论
请对这篇论文发表您的见解:
- 论文的创新点是什么?
- 方法是否合理?
- 实验结果是否可信?
- 有哪些可以改进的地方?
由 arXiv Monitor 自动创建