-
Notifications
You must be signed in to change notification settings - Fork 260
Open
Description
When the temperature is set to 0, baseline decoding reduces to step-by-step greedy argmax, and the output should be fully deterministic. However, in speculative decoding with tree decoding enabled and top-k set to greater than 1, the draft stage preemptively expands multiple candidate branches. The large model may accept any candidate that is valid in its distribution, potentially deviating from the baseline argmax path. This can lead to a situation where speculative decoding produces semantically reasonable outputs that differ from the baseline greedy results, with discrepancies more likely to occur when candidate branches are close in probability.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels