Flash-Attention wheel compatibility and fallback build

### Summary
Installation fails for `flash-attn` because the wheel referenced in `requirements.txt` does not match:
- the ABI used by publicly available PyTorch wheels,
- the CUDA / Torch version combinations available to external users.

Public wheels exist only for specific builds (e.g., `torch 2.5.1 + cu121 + cxx11abi=FALSE`), while the provided Flash-Attention wheel targets `torch 2.6 + cu12 + cxx11abi=TRUE`, which is unavailable publicly.

### What happens
- Pip/uv cannot install the wheel.
- Building from source fails unless the system has:
  - full CUDA toolkit installed,
  - gcc ≤ 12,
  - correct ABI matching.
- Fresh installs fail unless users manually replace the flash-attn dependency with a different wheel.

### Why this matters
Users cannot install the environment from `requirements.txt` as written. It requires NVIDIA-internal wheels or manual guesswork.

### Requested Fixes
- Provide **Flash-Attention wheels built against public torch binaries** (e.g., torch 2.5.1 cu121).
- OR pin torch to a version that *does* match existing Flash-Attention wheels.
- OR make Flash-Attention optional with a graceful fallback.
- Update the README to document the needed CUDA/GCC toolchain if source build is required.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash-Attention wheel compatibility and fallback build #5

Summary

What happens

Why this matters

Requested Fixes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Flash-Attention wheel compatibility and fallback build #5

Description

Summary

What happens

Why this matters

Requested Fixes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions