-
Notifications
You must be signed in to change notification settings - Fork 93
Add Windows/clang-cl support for AMD HIP backend #179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Windows/clang-cl support for AMD HIP backend #179
Conversation
- Use LoadLibrary/GetProcAddress on Windows instead of dlopen/dlsym - Use rocm_sdk.find_libraries() to locate amdhip64 - Add platform-specific macros for dynamic library loading - Escape Windows paths for C string embedding - Treat clang-cl as MSVC-compatible compiler in build.py - Fix NamedTemporaryFile handling on Windows in compiler.py
|
Looks good to me! I haven't followed the modern AMD toolchain for a while, but if this is enough to make it work, then it will not add much maintenance cost. Maybe you can also tell people at https://github.com/patientx/ComfyUI-Zluda and https://github.com/lshqqytiger/triton about this. |
|
I've cherry-picked this onto the upcoming |
|
Also, can you test the Triton 3.5 wheel at https://github.com/Comfy-Org/wheels/actions/runs/20599014618 , in the way that users would install it? If it works, I'll publish it to PyPI. |
Just tried generated_video.mp4 |
|
The python test examples from Triton finally work with my gfx1100 But when I try torch.compile via inductor in comfyui, it fails:
But this seems to be a bug with llvm. |
How did you achieve this? I have an Rx7600 and can I do this? can you share your ComfyUI-run.bat or the args and env variables you are using? and is it Comfy-UI official or ZLUDA? |
First of all ,thanks for working on this , much appricated. set CC=clang-cl since rocm is installed as package that last one should work right ? |
|
Haven’t tried building sage attention but you could follow the environment
variable setup here
https://github.com/jammm/SpargeAttn/blob/jam/amd_windows/README_AMD_WINDOWS.md#initialize-rocm-sdk
…On Saturday, January 3, 2026, patientx ***@***.***> wrote:
*patientx* left a comment (woct0rdho/triton-windows#179)
<#179 (comment)>
First of all ,thanks for working on this , much appricated.
Does installing it like this ok since it seems like you added it : " "pip
install triton-windows" which installs "triton-windows 3.5.1.post23" I then
installed sage-attention with "pip install sageattention==1.0.6"and
flash-attention also. But in the end both gave errors. I set up the
parameters like this in starter batch for comfy :
set CC=clang-cl
set CXX=clang-cl
set DISTUTILS_USE_SDK=1
for /f "delims=" %%i in ('python -c "import rocm; print(rocm.*path*[0])"')
do set ROCM_HOME=%%i
since rocm is installed as package that last one should work right ?
—
Reply to this email directly, view it on GitHub
<#179 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATCSOEKRMS3ZPUSJ323LM34E7LGPAVCNFSM6AAAAACQKA7ZBWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTOMBXGEYTKNJVGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I’m curious about Sage too... |
It turns out I only needed to run rocm-sdk init" after activating venv. It works if this is done in commandline only with batch files too. Torch.compile now works but generates black output also sageattention as deluxa says doesn't work. edit : sage-attention works with the patches on my comfyui-zluda fork. |
I also compiled SpargeAttn but how exactly am i supposed to use it with comfyUI, any idea |
I don't know what changes your fork has, but why don't you make a PR here so everyone can use SageAttention? I'll try Sage myself soon too. What about the performance vs SDPA flash? |
On my rx6800 sdpa is the slowest , slower than quad-cross which I was using by default. Sage attention "patches" were made by someone in sdnext discord and I was applying them every install with zluda setup , interestingly I didn't need them with lee's torch builds from may 2024. (they were using hip 6.5 though) Now it seems the same patches also work here. Just replace these three files actually here are curl commands to apply them directly when venv is activated and inside comfyui directory.
|
Int8? I think these are for RDNA2 or 3, don't think RDNA4 needs them. Will try soon tho Edit: Yes, it does. I’m getting a lot of errors coming from |
|
Yes, that works. It did do a nasty crash the first time, which I am saving here not as a complaint, but as a reference for a possible side project: "Write a program to force an AMD driver crash in order to free up all that VRAM that dwm never gives back." FYI the ZLUDA sageattn is basically just a patch to change the parameters to Otherwise it uses too much "shared memory" and produces black screens. See also https://raw.githubusercontent.com/sfinktah/amd-torch/refs/heads/main/patches/sageattention-1.0.6+sfinktah+env-py3-none-any.patch which is an environment variable adjustable version of the same thing. |
I have tried many many combinations and besides these and none of them worked, most of the time i got garbled noise instead of image and sometimes i got a black and white frame for what was supposed to be the subject, i have an Rx7600 |
|
I was able to use this to get Flash Attention 2 running in ComfyUI on Windows. I ran some SDXL performance tests with pretty decent results. Using an SDXL fine-tune using a DMD2 lora and an upscaler KSampler step Compared using flash attention with pytorch cross attention across 10 image generations, listing the average it/s for both the base and upscaler samplers at the end.
edit: I also just tested with sage attention 1, but the results seem to be the same as cross attention |
how did you do that? I have also compiled Sparge and Sage but haven't tried Flash yet, Flash 2 seems even better so how can I? |
I was also able, based on my results Flash 2 is slower than AOTriton SPDA Flash on RDNA4. |
This allows triton running on AMD GPUs on windows via. TheRock wheels - https://github.com/ROCm/TheRock/blob/main/RELEASES.md
It should build as-is with the same build process as @woct0rdho as it only modifies .py and a .c file that's compiled at runtime.
Whenever you run a program that requires triton, make sure to set the following environment variables:
CCandCXXtoclang-clROCM_HOMEtorocm-sdk path --root;$PATHDISTUTILS_USE_SDK=1Summary of changes: