Skip to content

Conversation

@jammm
Copy link

@jammm jammm commented Dec 30, 2025

Use rocWMMA for GEMM kernels, and use triton-windows and SpargeAttn modified to support AMD on Windows.
See README_AMD_WINDOWS.md for setup steps.

Generated video using Wan2.1 1.4b 480p default command as per README.md:

generated_video.mp4

Limitations:

  • Currently supports only RDNA3/3.5, though can possibly work on RDNA4 with minor modificaitons.
  • multi-cta/distributed not tested

- Add HIP kernels for GEMM, LayerNorm, RMSNorm, and quantization ops
- Integrate rocWMMA for matrix operations on AMD GPUs
- Update setup.py for Windows ROCm builds with clang-cl
- Add platform detection (CUDA/HIP) with common abstractions
- Optimize SLA kernel config for ROCm (BLKK=16)
- Update .gitignore to exclude build artifacts and IDE files
- Fix distributed utils and network files for ROCm compatibility
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant