Skip to content

Feature: FP8 GEMM Kernel #7

@ShlokVFX

Description

@ShlokVFX

Implement an FP8 GEMM kernel optimized for inference workloads.

The kernel should support scaling, accumulation, and output conversion
while maximizing tensor core utilization.

Planned Benchmarks

  • TFLOPs achieved vs FP16
  • Accuracy impact
  • Memory bandwidth utilization

Learning Objectives

  • FP8 formats and scaling
  • Tensor core programming
  • Numerical stability tradeoffs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions