Skip to content

Enhancement: Explore FP8 (float8) training support for Hopper architectures #119

@aniruddhaadak80

Description

@aniruddhaadak80

Problem

The script currently uses bfloat16 via torch.amp.autocast. However, on H100 and newer architectures (Compute Capability 9.0+), FP8 tensor cores offer up to 2x the throughput of BF16.

Proposal

Introduce a pathway for the AI to experiment with FP8 training, possibly utilizing torch.float8_e4m3fn or transformer_engine. This requires careful handling of scaling factors. This issue tracks the AI agent's overall goal of upgrading the linear layers and attention projections to FP8, which could drastically increase the current baseline MFU.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions