Skip to content

Feature: Speculative Decoding Kernel #6

@ShlokVFX

Description

@ShlokVFX

Implement speculative decoding support using a draft and target model
execution strategy.

The focus is on kernel-level optimizations for verification and rollback
steps to maximize throughput gains.

Planned Benchmarks

  • Speedup vs standard decoding
  • Verification overhead
  • Token acceptance rate impact

Learning Objectives

  • Speculative execution principles
  • Verification kernel design
  • Control flow on GPU

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions