Skip to content

Feature: RoPE (Rotary Positional Embedding) Kernels #3

@ShlokVFX

Description

@ShlokVFX

Implement optimized RoPE kernels for attention, supporting both prefill
and decode paths.

Focus on minimizing compute overhead and improving data locality during
positional embedding application.

Planned Benchmarks

  • RoPE overhead relative to attention
  • Prefill vs decode performance
  • Kernel fusion opportunities

Learning Objectives

  • Positional encoding math
  • Kernel fusion with attention
  • Register and shared memory usage patterns

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurePlanned feature request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions