Skip to content

Some thoughts on tree decoding #301

@ggg-s

Description

@ggg-s

When the temperature is set to 0, baseline decoding reduces to step-by-step greedy argmax, and the output should be fully deterministic. However, in speculative decoding with tree decoding enabled and top-k set to greater than 1, the draft stage preemptively expands multiple candidate branches. The large model may accept any candidate that is valid in its distribution, potentially deviating from the baseline argmax path. This can lead to a situation where speculative decoding produces semantically reasonable outputs that differ from the baseline greedy results, with discrepancies more likely to occur when candidate branches are close in probability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions