Skip to content

Non‑greedy rejection sampling ignores draft (q); not lossless per Speculative Decoding #329

@ethantsliu

Description

@ethantsliu

Summary

The non-greedy rejection sampling path does not compute the draft distribution $q$, so it cannot implement the
lossless acceptance/replacement rule from speculative decoding. It currently uses the target distribution $p$
only (with some candidate masking), which deviates from the referenced algorithms.

Expected (lossless) behavior

From EAGLE-3 paper (end of Sec. 2.1):

$$\mathrm{norm}(\max(0, p_{j+i} - \hat{p}_{j+i}))$$

Additionally, from Fast Inference from Transformers via Speculative Decoding (2022) (Sec. 2.3, Algorithm 1):

  • Accept with probability: $\min(1, p/q)$
  • If rejected, sample from:
$$\mathrm{norm}(\max(0, p - q))$$

Current behavior in this repo

In eagle/model/utils.py, the non-greedy branch uses target logits only:

  • evaluate_posterior computes:

    gt_logits = logits[fi, i - 1][None]
    gt_logits = logits_processor(None, gt_logits)[0]
    gtp = torch.softmax(gt_logits, dim=0)
    ...
    qx = 1.0
    acp = px / qx

    This is not min(1, p/q).

  • Replacement sampling uses sample_p = gtp (target-only), and
    update_inference_inputs samples from sample_p, not
    $\mathrm{norm}(\max(0, p - q))$.

Code references

  • eagle/model/utils.py (non-greedy branch): evaluate_posterior(...) around lines 360–416
  • Replacement sampling: update_inference_inputs(...) around lines 460–466

This makes the non-greedy path lossy, deviating from the algorithmic guarantees described in both papers.

Suggested fix

At rejection time:

  1. Compute draft logits $q$ for the same position (run draft model on accepted prefix or use cached KV if available),
  2. Apply $\mathrm{norm}(\max(0, p - q))$ for replacement,
  3. Use acceptance probability $\min(1, p/q)$ for the proposed token.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions