Skip to content

Flash-Attention wheel compatibility and fallback build #5

@anselmotalotta

Description

@anselmotalotta

Summary

Installation fails for flash-attn because the wheel referenced in requirements.txt does not match:

  • the ABI used by publicly available PyTorch wheels,
  • the CUDA / Torch version combinations available to external users.

Public wheels exist only for specific builds (e.g., torch 2.5.1 + cu121 + cxx11abi=FALSE), while the provided Flash-Attention wheel targets torch 2.6 + cu12 + cxx11abi=TRUE, which is unavailable publicly.

What happens

  • Pip/uv cannot install the wheel.
  • Building from source fails unless the system has:
    • full CUDA toolkit installed,
    • gcc ≤ 12,
    • correct ABI matching.
  • Fresh installs fail unless users manually replace the flash-attn dependency with a different wheel.

Why this matters

Users cannot install the environment from requirements.txt as written. It requires NVIDIA-internal wheels or manual guesswork.

Requested Fixes

  • Provide Flash-Attention wheels built against public torch binaries (e.g., torch 2.5.1 cu121).
  • OR pin torch to a version that does match existing Flash-Attention wheels.
  • OR make Flash-Attention optional with a graceful fallback.
  • Update the README to document the needed CUDA/GCC toolchain if source build is required.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions