Feature: Custom Sampling Kernels

Implement custom token sampling kernels for inference, including
top-k, top-p, and temperature-based sampling.

The goal is to reduce CPU-GPU synchronization and avoid framework-level
sampling overhead.

Planned Benchmarks
- Sampling latency per token
- End-to-end decode latency
- Comparison with framework sampling

Learning Objectives
- Parallel prefix sums
- Warp-level reductions
- Sampling stability and numerical behavior


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Custom Sampling Kernels #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature: Custom Sampling Kernels #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions