Implement speculative decoding support using a draft and target model
execution strategy.
The focus is on kernel-level optimizations for verification and rollback
steps to maximize throughput gains.
Planned Benchmarks
- Speedup vs standard decoding
- Verification overhead
- Token acceptance rate impact
Learning Objectives
- Speculative execution principles
- Verification kernel design
- Control flow on GPU