Skip to content

[论文讨论] Multi-Token Prediction via Self-Distillation #64

@gqy20

Description

@gqy20

论文信息

标题: Multi-Token Prediction via Self-Distillation
作者: John Kirchenbauer, Abhimanyu Hans, Brian Bartoldson, Micah Goldblum, Ashwinee Panda 等 6 位作者
发布时间: 2026-02-05
分类: cs.CL
PDF: Download

简介

Existing techniques for accelerating language model inference, such as speculative decoding, require training auxiliary speculator models and building and deploying complex inference pipelines. We consider a new approach for converting a pretrained autoregressive language model from a slow single next token prediction model into a fast standalone multi-token prediction model using a simple online distillation objective. The final model retains the exact same implementation as the pretrained initial checkpoint and is deployable without the addition of any auxiliary verifier or other specialized inference code. On GSM8K, our method produces models that can decode more than $3\times$ faster on average at $<5%$ drop in accuracy relative to single token decoding performance.

推荐理由

论文2推荐讨论:'standalone多token预测'的实现方式简洁有效,3倍加速但精度损失<5%的权衡值得评估其实际应用价值

讨论

请对这篇论文发表您的见解:

  • 论文的创新点是什么?
  • 方法是否合理?
  • 实验结果是否可信?
  • 有哪些可以改进的地方?

由 arXiv Monitor 自动创建

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions