Skip to content

Conversation

@jammm
Copy link

@jammm jammm commented Dec 30, 2025

Use rocWMMA instead of CUTLASS.
See README_AMD_WINDOWS.md for setup steps.

Heavily inspired from thu-ml/SageAttention#332

Used claude opus 4.5 to assist.
Tested with TurboDiffusion

Currently only supports RDNA3/3.5

@jammm jammm force-pushed the jam/amd_windows branch 4 times, most recently from 15a0f4c to 8e7f363 Compare December 30, 2025 13:52
@0xDELUXA
Copy link

Great to see this!
Have you run any benchmarks compared to flash?

@0xDELUXA
Copy link

A small nitpick, but for me,
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/ --pre rocm-sdk[devel]
as mentioned in the readme, doesn't work. I think we need:
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/ --pre rocm-sdk-devel

@jammm
Copy link
Author

jammm commented Dec 30, 2025

Great to see this! Have you run any benchmarks compared to flash?

yeah. IIRC, it's slower than aotriton SDPA on larger topk values. but around 0.25 or below sparsity this SLA one does better. Note that SDPA FA is always dense attention as it doesn't support SLA

@jammm
Copy link
Author

jammm commented Dec 30, 2025

A small nitpick, but for me, pip install --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/ --pre rocm-sdk[devel] as mentioned in the readme, doesn't work. I think we need: pip install --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/ --pre rocm-sdk-devel

Hmm that's strange. It should work. Can you create an issue in https://github.com/ROCm/TheRock with repro steps?

@0xDELUXA
Copy link

Hmm that's strange. It should work. Can you create an issue in https://github.com/ROCm/TheRock with repro steps?

Ready: ROCm/TheRock#2726

@0xDELUXA
Copy link

0xDELUXA commented Dec 30, 2025

For me, the test script from README_AMD_WINDOWS.md prints:

(venv) PS D:\sd> python test_spargeattn.py

Warning: Sage2++ NOT enabled
Traceback (most recent call last):
  File "C:\Users\djern\Downloads\test_spargeattn.py", line 17, in <module>
    sparge = spas_sage_attn_meansim_cuda(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\sdnext\venv\Lib\site-packages\torch\_dynamo\eval_frame.py", line 1191, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "D:\AI\sdnext\venv\Lib\site-packages\spas_sage_attn\core.py", line 366, in spas_sage_attn_meansim_cuda
    qattn.qk_int8_sv_f16_accum_f16_block_sparse_attn_inst_buf_with_pv_threshold(q_int8, k_int8, v, o, lut, valid_block_num, pvthreshd, q_scale, k_scale, 1, _is_causal, 1, scale, 0)
RuntimeError: output must have same dtype as value

If I change the dtype from torch.bfloat16 to torch.float16, it prints:

Warning: Sage2++ NOT enabled
Cosine similarity: 0.467271

Is this behavior expected on the gfx1200, or is there a problem with my build?

@jammm
Copy link
Author

jammm commented Dec 30, 2025

If I change the dtype from torch.bfloat16 to torch.float16, it prints:

Warning: Sage2++ NOT enabled
Cosine similarity: 0.467271

Is this behavior expected on the gfx1200, or is there a problem with my build?

Thanks for the catch! I've fixed the dtype issue. However, this code isn't implemented for rdna4 yet. It's just rdna3 for now. Some modifications need to be made specifically for rdna4 to work with rocWMMA as the matrix fragment layouts for the individual elements are different. But I don't have access to an RDNA4 GPU at the moment.

@0xDELUXA
Copy link

Thanks for the catch! I've fixed the dtype issue. However, this code isn't implemented for rdna4 yet. It's just rdna3 for now. Some modifications need to be made specifically for rdna4 to work with rocWMMA as the matrix fragment layouts for the individual elements are different. But I don't have access to an RDNA4 GPU at the moment.

I see. No worries, I can wait until you have an RDNA4 GPU available. It was exactly the same with PyTorch support for gfx1200 XD

@0xDELUXA
Copy link

0xDELUXA commented Dec 30, 2025

Correct me if I'm wrong, but theoretically RDNA4 can use fp8, so we would need sgattn_f8.cu and launch_sgattn_f8.cu in SpargeAttn\csrc\qattn\rocm. Then SAGE2PP_ENABLED = True. This would allow it to be on par with SDPA, or potentially even better.

Based on:

#if (defined(__gfx942__) || defined(__gfx1200__) || defined(__gfx1201__) ||                        \
     defined(__gfx950__)) &&                                                                       \
    __HIP_DEVICE_COMPILE__
#define HIP_FP8_CVT_FAST_PATH 1
#else
#define HIP_FP8_CVT_FAST_PATH 0
#endif

#if defined(__gfx942__) && __HIP_DEVICE_COMPILE__
#define HIP_FP8_TYPE_OCP 0
#define HIP_FP8_TYPE_FNUZ 1
#elif (defined(__gfx1200__) || defined(__gfx1201__) || defined(__gfx950__)) &&                     \
    __HIP_DEVICE_COMPILE__
#define HIP_FP8_TYPE_OCP 1
#define HIP_FP8_TYPE_FNUZ 0
#else
#define HIP_FP8_TYPE_FNUZ 1
#define HIP_FP8_TYPE_OCP 1
#endif

in venv\Lib\site-packages\_rocm_sdk_devel\include\hip\amd_detail\amd_hip_fp8.h

@jammm
Copy link
Author

jammm commented Dec 31, 2025

Correct me if I'm wrong, but theoretically RDNA4 can use fp8, so we would need sgattn_f8.cu and launch_sgattn_f8.cu in SpargeAttn\csrc\qattn\rocm. Then SAGE2PP_ENABLED = True. This would allow it to be on par with SDPA, or potentially even better.

Yup. Not sure about perf vs. SDPA as aotriton should have fp8 kernels too, I think?

@jammm
Copy link
Author

jammm commented Dec 31, 2025

Hmm that's strange. It should work. Can you create an issue in https://github.com/ROCm/TheRock with repro steps?

Ready: ROCm/TheRock#2726

Just checked. My bad, I forgot it was deprecated. I'll modify the README's to only specify the torch installation as that should automatically download the corresponding rocm wheels as a dependency.

@0xDELUXA
Copy link

0xDELUXA commented Dec 31, 2025

Yup. Not sure about perf vs. SDPA as aotriton should have fp8 kernels too, I think?

Don't really understand these things, so I think you're right. All I know is over at Nvidia, Sparge is better than SPDA Flash in these type of workloads.

@rwfsmith
Copy link

rwfsmith commented Jan 5, 2026

A small nitpick, but for me, pip install --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/ --pre rocm-sdk[devel] as mentioned in the readme, doesn't work. I think we need: pip install --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/ --pre rocm-sdk-devel

rocm-sdk-devel didn't include the tar file for me, but this worked:

pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ "rocm[libraries,devel]"

@rwfsmith
Copy link

rwfsmith commented Jan 5, 2026

ran this to build on linux:

export ROCM_ROOT="/home/ryan/venv/lib/python3.12/site-packages/_rocm_sdk_core"
export ROCM_HOME="$ROCM_ROOT"
export PATH="$ROCM_ROOT/lib/llvm/bin:$ROCM_ROOT/bin:$PATH"

# Compiler settings
export CC="$ROCM_ROOT/lib/llvm/bin/clang"
export CXX="$ROCM_ROOT/lib/llvm/bin/clang++"
export HIP_CLANG_PATH="$ROCM_ROOT/lib/llvm/bin"

# Include and library paths (for thrust, etc. from _rocm_sdk_devel)
export CPLUS_INCLUDE_PATH="/home/ryan/venv/lib/python3.12/site-packages/_rocm_sdk_devel/include:$CPLUS_INCLUDE_PATH"
export C_INCLUDE_PATH="/home/ryan/venv/lib/python3.12/site-packages/_rocm_sdk_devel/include:$C_INCLUDE_PATH"
export LIBRARY_PATH="/home/ryan/venv/lib/python3.12/site-packages/_rocm_sdk_devel/lib:$LIBRARY_PATH"
export LD_LIBRARY_PATH="/home/ryan/venv/lib/python3.12/site-packages/_rocm_sdk_devel/lib:$LD_LIBRARY_PATH"

# Enable experimental features
export FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE"
export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL="1"

pip install --no-build-isolation -v .

it builds successfully, but my cosine similarity is low

Cosine similarity: 0.998755

@0xDELUXA
Copy link

0xDELUXA commented Jan 5, 2026

rocm-sdk-devel didn't include the tar file for me, but this worked:

pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ "rocm[libraries,devel]"

You mean on Linux?

@jammm
Copy link
Author

jammm commented Jan 5, 2026

ran this to build on linux:

it builds successfully, but my cosine similarity is low

Cosine similarity: 0.998755

this is fine I believe. Give it a try on some image/video model to confirm.

@0xDELUXA
Copy link

0xDELUXA commented Jan 5, 2026

ran this to build on linux:

it builds successfully, but my cosine similarity is low

Cosine similarity: 0.998755

this is fine I believe. Give it a try on some image/video model to confirm.

What I wanted to say: 0.998... should be fine, it’s very close to 100. Unlike the value I got: 0.467271 (because of RNDA4). I hope this PR will be optimized for the RX 9000 series in the near future.

@IxMxAMAR
Copy link

IxMxAMAR commented Jan 5, 2026

Hello, I have Rx7600 and built this with similar cosine similarity around 0.998, currently using this in ComfyUI works like a charm well topk values below 0.2 mess things up while 0.25 topk give similar results to sageattn1.06 in quality and speed so it is a great alternative for sageattn on AMD i guess.

@rwfsmith
Copy link

rwfsmith commented Jan 5, 2026

rocm-sdk-devel didn't include the tar file for me, but this worked:
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ "rocm[libraries,devel]"

You mean on Linux?

That was initially on Linux, but I ran it again when I rebooted into Windows and using "rocm-sdk-devel" works there, but rocm[devel] worked on both.

Funny enough, though, I was able to build this pretty easily on Linux but having trouble in Windows :P.

@rwfsmith
Copy link

rwfsmith commented Jan 5, 2026

ran this to build on linux:
it builds successfully, but my cosine similarity is low
Cosine similarity: 0.998755

this is fine I believe. Give it a try on some image/video model to confirm.

how would I enable it for use in ComfyUI?

@ouco1986
Copy link

ouco1986 commented Jan 6, 2026

Hello. I have successfully compiled the SpargeAttn AMD you provided on Linux. My graphics card is 9070xt. After selecting SpargeAttn for the workflow under Comfyui and selecting sparse_stage as the parameter, the K-sampling error is as follows. It can also run the same workflow with the same configuration on Windows. I don't know if it can solve this problem. Thank @jammm

ComfyUI Error Report

Error Details

  • Node ID: 5
  • Node Type: WanVideoSampler
  • Exception Type: RuntimeError
  • Exception Message: block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.

Stack Trace

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 516, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 330, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 304, in _async_map_node_over_list
    await process_inputs(input_dict, i)

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 292, in process_inputs
    result = f(**inputs)

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 3251, in process
    raise e

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 2237, in process
    noise_pred_context, _, new_teacache = predict_with_cfg(
                                          ~~~~~~~~~~~~~~~~^
        partial_latent_model_input,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<6 lines>...
        wananim_face_pixels=partial_wananim_face_pixels, wananim_pose_latents=partial_wananim_pose_latents, multitalk_audio_embeds=multitalk_audio_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        uni3c_data=uni3c_data, flashvsr_LQ_latent=partial_flashvsr_LQ_latent)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 1755, in predict_with_cfg
    raise e

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 1602, in predict_with_cfg
    noise_pred_cond, noise_pred_ovi, cache_state_cond = transformer(
                                                        ~~~~~~~~~~~^
        context=positive_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        **base_params
        ^^^^^^^^^^^^^
    )
    ^

  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1780, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1791, in _call_impl
    return forward_call(*args, **kwargs)

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 3173, in forward
    x, x_ip, lynx_ref_feature, x_ovi = block(x, x_ip=x_ip, lynx_ref_feature=lynx_ref_feature, x_ovi=x_ovi, x_onetoall_ref=x_onetoall_ref, onetoall_freqs=onetoall_freqs, **kwargs)
                                       ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1780, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1791, in _call_impl
    return forward_call(*args, **kwargs)

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 1171, in forward
    y = self.self_attn.forward_radial(q, k, v, dense_step=False)

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 516, in forward_radial
    x = RadialSpargeSageAttn(q, k, v, self.mask_map, decay_factor=self.decay_factor)

  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 1209, in _fn
    return fn(*args, **kwargs)

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/radial_attention/attn_mask.py", line 168, in RadialSpargeSageAttn
    return sparse_attn_func(
           ~~~~~~~~~~~~~~~~^
        query[:, :, :mask_map.video_token_num, :],
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
        tensor_layout="NHD"
        ^^^^^^^^^^^^^^^^^^^
    ).contiguous()
    ^

  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 1209, in _fn
    return fn(*args, **kwargs)

  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/spas_sage_attn/core.py", line 273, in block_sparse_sage2_attn_cuda
    raise RuntimeError(
    ...<2 lines>...
    )

System Information

  • ComfyUI Version: 0.6.0
  • Arguments: /media/wr/LTW/linux/ROCm7/ComfyUI/main.py --use-pytorch-cross-attention --fast
  • OS: linux
  • Python Version: 3.13.5 (main, Jun 25 2025, 18:55:22) [GCC 14.2.0]
  • Embedded Python: false
  • PyTorch Version: 2.11.0.dev20251224+rocm7.1

Devices

  • Name: cuda:0 AMD Radeon RX 9070 XT : native
    • Type: cuda
    • VRAM Total: 17095983104
    • VRAM Free: 17026777088
    • Torch VRAM Total: 0
    • Torch VRAM Free: 0

Logs

2026-01-06T16:15:11.015601 - ComfyUI frontend version: 1.35.9
2026-01-06T16:15:11.017276 - [Prompt Server] web root: /media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/comfyui_frontend_package/static
2026-01-06T16:15:11.761535 - Total VRAM 16304 MB, total RAM 80335 MB
2026-01-06T16:15:11.761669 - pytorch version: 2.11.0.dev20251224+rocm7.1
2026-01-06T16:15:11.762603 - Set: torch.backends.cudnn.enabled = False for better AMD performance.
2026-01-06T16:15:11.762686 - AMD arch: gfx1201
2026-01-06T16:15:11.762753 - ROCm version: (7, 1)
2026-01-06T16:15:11.762837 - Enabled fp16 accumulation.
2026-01-06T16:15:11.762922 - Set vram state to: NORMAL_VRAM
2026-01-06T16:15:11.763009 - Device: cuda:0 AMD Radeon RX 9070 XT : native
2026-01-06T16:15:11.763195 - Enabled pinned memory 76318.0
2026-01-06T16:15:12.321069 - # # #
AMD GO FAST
# # #2026-01-06T16:15:12.321127 - 
2026-01-06T16:15:12.343678 - [Crystools �[0;32mINFO�[0m] Crystools version: 1.27.4
2026-01-06T16:15:12.376358 - [Crystools �[0;32mINFO�[0m] Platform release: 6.12.57+deb13-amd64
2026-01-06T16:15:12.376446 - [Crystools �[0;32mINFO�[0m] JETSON: Not detected.
2026-01-06T16:15:12.378675 - [Crystools �[0;32mINFO�[0m] CPU: AMD Ryzen 9 5900X 12-Core Processor - Arch: x86_64 - OS: Linux 6.12.57+deb13-amd64
2026-01-06T16:15:12.382797 - /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-Crystools/general/gpu.py:67: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml
2026-01-06T16:15:12.391348 - [Crystools �[0;31mERROR�[0m] Could not init pynvml (NVIDIA). NVML Shared Library Not Found
2026-01-06T16:15:12.391520 - [Crystools �[0;33mWARNING�[0m] No GPU monitoring libraries available.
2026-01-06T16:15:16.282651 - �[34m[ComfyUI-Easy-Use] server: �[0mv1.3.4 �[92mLoaded�[0m2026-01-06T16:15:16.282705 - 
2026-01-06T16:15:16.282747 - �[34m[ComfyUI-Easy-Use] web root: �[0m/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-easy-use/web_version/v2 �[92mLoaded�[0m2026-01-06T16:15:16.282785 - 
2026-01-06T16:15:16.297184 - ComfyUI-GGUF: Allowing full torch compile
2026-01-06T16:15:16.365041 - Traceback (most recent call last):
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/nodes.py", line 2149, in load_custom_node
    module_spec.loader.exec_module(module)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "<frozen importlib._bootstrap_external>", line 1026, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-GIMM-VFI/__init__.py", line 1, in <module>
    from .nodes import NODE_CLASS_MAPPINGS, NODE_DISPLAY_NAME_MAPPINGS
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-GIMM-VFI/nodes.py", line 13, in <module>
    from .gimmvfi.generalizable_INR.gimmvfi_r import GIMMVFI_R
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-GIMM-VFI/gimmvfi/generalizable_INR/gimmvfi_r.py", line 31, in <module>
    from .modules.softsplat import softsplat
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-GIMM-VFI/gimmvfi/generalizable_INR/modules/softsplat.py", line 12, in <module>
    import cupy
ModuleNotFoundError: No module named 'cupy'

2026-01-06T16:15:16.365259 - Cannot import /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-GIMM-VFI module for custom nodes: No module named 'cupy'
2026-01-06T16:15:16.396261 - ### Loading: ComfyUI-Manager (V3.37.1)
2026-01-06T16:15:16.396742 - [ComfyUI-Manager] network_mode: public
2026-01-06T16:15:16.447423 - ### ComfyUI Version: v0.6.0-3-g532e2850 | Released on '2025-12-24'
2026-01-06T16:15:16.499003 - Traceback (most recent call last):
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/nodes.py", line 2149, in load_custom_node
    module_spec.loader.exec_module(module)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "<frozen importlib._bootstrap_external>", line 1026, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-tensorops/__init__.py", line 1, in <module>
    from .nodes import NODE_CLASS_MAPPINGS, NODE_DISPLAY_NAME_MAPPINGS
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-tensorops/nodes/__init__.py", line 3, in <module>
    from .save_surreal import SaveJsonToSurreal, SaveTextToSurreal
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-tensorops/nodes/save_surreal.py", line 1, in <module>
    from .surreal import surreal_connect
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-tensorops/nodes/surreal.py", line 1, in <module>
    from surrealist import Surreal
ModuleNotFoundError: No module named 'surrealist'

2026-01-06T16:15:16.499158 - Cannot import /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-tensorops module for custom nodes: No module named 'surrealist'
2026-01-06T16:15:17.000042 - Warning: Sage2++ NOT enabled2026-01-06T16:15:17.000104 - 
2026-01-06T16:15:17.450179 - �[36;20m[/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui_controlnet_aux] | INFO -> Using ckpts path: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui_controlnet_aux/ckpts�[0m
2026-01-06T16:15:17.450645 - �[36;20m[/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui_controlnet_aux] | INFO -> Using symlinks: False�[0m
2026-01-06T16:15:17.451091 - �[36;20m[/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui_controlnet_aux] | INFO -> Using ort providers: ['CUDAExecutionProvider', 'DirectMLExecutionProvider', 'OpenVINOExecutionProvider', 'ROCMExecutionProvider', 'MIGraphXExecutionProvider', 'CPUExecutionProvider', 'CoreMLExecutionProvider']�[0m
2026-01-06T16:15:17.486896 - [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json
2026-01-06T16:15:17.488448 - /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui_controlnet_aux/node_wrappers/dwpose.py:26: UserWarning: DWPose: Onnxruntime not found or doesn't come with acceleration providers, switch to OpenCV with CPU device. DWPose might run very slowly
  warnings.warn("DWPose: Onnxruntime not found or doesn't come with acceleration providers, switch to OpenCV with CPU device. DWPose might run very slowly")
2026-01-06T16:15:17.552620 - 
============================================================2026-01-06T16:15:17.552686 - 
2026-01-06T16:15:17.552740 - ERROR: Could not import vendored TurboDiffusion code!2026-01-06T16:15:17.552788 - 
2026-01-06T16:15:17.552839 - ============================================================2026-01-06T16:15:17.552888 - 
2026-01-06T16:15:17.552938 - Import error: No module named 'loguru'2026-01-06T16:15:17.552990 - 
2026-01-06T16:15:17.553040 - 
This should not happen as TurboDiffusion code is vendored in the package.2026-01-06T16:15:17.553093 - 
2026-01-06T16:15:17.553140 - Please report this issue at: https://github.com/anveshane/Comfyui_turbodiffusion/issues2026-01-06T16:15:17.553185 - 
2026-01-06T16:15:17.553232 - ============================================================
2026-01-06T16:15:17.553278 - 
2026-01-06T16:15:17.560090 - ERROR: Could not import TurboDiffusion modules: No module named 'loguru'2026-01-06T16:15:17.560138 - 
2026-01-06T16:15:17.563094 - ERROR: Could not import Wan2pt1VAEInterface: No module named 'loguru'2026-01-06T16:15:17.563140 - 
2026-01-06T16:15:17.563224 - 
============================================================2026-01-06T16:15:17.563267 - 
2026-01-06T16:15:17.563307 - ComfyUI TurboDiffusion I2V Node2026-01-06T16:15:17.563347 - 
2026-01-06T16:15:17.563389 - ============================================================2026-01-06T16:15:17.563428 - 
2026-01-06T16:15:17.563467 - Version: 0.1.02026-01-06T16:15:17.563506 - 
2026-01-06T16:15:17.563546 - Loaded 5 nodes:2026-01-06T16:15:17.563584 - 
2026-01-06T16:15:17.563624 -   - TurboWan I2V Sampler (TurboWanSampler)2026-01-06T16:15:17.563663 - 
2026-01-06T16:15:17.563716 -   - Save Video (TurboDiffusionSaveVideo)2026-01-06T16:15:17.563756 - 
2026-01-06T16:15:17.563796 -   - TurboWan Model Loader (Quantized) (TurboWanModelLoader)2026-01-06T16:15:17.563835 - 
2026-01-06T16:15:17.563875 -   - TurboDiffusion I2V Sampler (TurboDiffusionI2VSampler)2026-01-06T16:15:17.563915 - 
2026-01-06T16:15:17.563957 -   - TurboWan VAE Loader (TurboWanVAELoader)2026-01-06T16:15:17.563996 - 
2026-01-06T16:15:17.564035 - 
Features:2026-01-06T16:15:17.564073 - 
2026-01-06T16:15:17.564111 -   - TurboWan Model Loader: Official TurboDiffusion model loading2026-01-06T16:15:17.564149 - 
2026-01-06T16:15:17.564187 -   - Supports int8 block-wise quantized .pth models2026-01-06T16:15:17.564228 - 
2026-01-06T16:15:17.564267 -   - SageSLA/SLA attention optimization for faster inference2026-01-06T16:15:17.564305 - 
2026-01-06T16:15:17.564343 -   - Attention top-k tuning (0.01-1.0)2026-01-06T16:15:17.564380 - 
2026-01-06T16:15:17.564418 - 
Requires:2026-01-06T16:15:17.564456 - 
2026-01-06T16:15:17.564494 -   - TurboDiffusion Python package (manual install)2026-01-06T16:15:17.564532 - 
2026-01-06T16:15:17.564570 -   - Quantized .pth models from HuggingFace2026-01-06T16:15:17.564607 - 
2026-01-06T16:15:17.564646 - ============================================================
2026-01-06T16:15:17.564684 - 
2026-01-06T16:15:17.589803 - 
2026-01-06T16:15:17.589859 - �[92m[rgthree-comfy] Loaded 48 exciting nodes. 🎉�[0m2026-01-06T16:15:17.589900 - 
2026-01-06T16:15:17.589938 - 
2026-01-06T16:15:17.589987 - �[33m[rgthree-comfy] ComfyUI's new Node 2.0 rendering may be incompatible with some rgthree-comfy nodes and features, breaking some rendering as well as losing the ability to access a node's properties (a vital part of many nodes). It also appears to run MUCH more slowly spiking CPU usage and causing jankiness and unresponsiveness, especially with large workflows. Personally I am not planning to use the new Nodes 2.0 and, unfortunately, am not able to invest the time to investigate and overhaul rgthree-comfy where needed. If you have issues when Nodes 2.0 is enabled, I'd urge you to switch it off as well and join me in hoping ComfyUI is not planning to deprecate the existing, stable canvas rendering all together.
�[0m2026-01-06T16:15:17.590028 - 
2026-01-06T16:15:17.601844 - [INFO] ComfyUI-GGUF not found, using our implementation2026-01-06T16:15:17.601902 - 
2026-01-06T16:15:17.602019 - [ROCm Ninodes] Successfully loaded from rocm_nodes package2026-01-06T16:15:17.602071 - 
2026-01-06T16:15:17.604813 - �[34m[通用块交换]�[0m: 节点 '通用模型 块交换' 已成功加载。2026-01-06T16:15:17.604858 - 
2026-01-06T16:15:17.605093 - 
Import times for custom nodes:
2026-01-06T16:15:17.605168 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/websocket_image_save.py
2026-01-06T16:15:17.605227 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/cfz_vae_loader.py
2026-01-06T16:15:17.605282 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/cfz_cudnn.toggle.py
2026-01-06T16:15:17.605336 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/cfz_patcher.py
2026-01-06T16:15:17.605391 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/通用交换块(应该)
2026-01-06T16:15:17.605446 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-amd-go-fast
2026-01-06T16:15:17.605499 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-DD-Translation
2026-01-06T16:15:17.605578 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-GGUF
2026-01-06T16:15:17.605631 -    0.0 seconds (IMPORT FAILED): /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-tensorops
2026-01-06T16:15:17.605686 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-custom-scripts
2026-01-06T16:15:17.605738 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui_essentials
2026-01-06T16:15:17.605790 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/rocm-ninodes
2026-01-06T16:15:17.605845 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite
2026-01-06T16:15:17.605897 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-segment-anything-2
2026-01-06T16:15:17.605948 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-KJNodes
2026-01-06T16:15:17.605998 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/rgthree-comfy
2026-01-06T16:15:17.606049 -    0.1 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/Comfyui_turbodiffusion
2026-01-06T16:15:17.606100 -    0.1 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui_controlnet_aux
2026-01-06T16:15:17.606150 -    0.1 seconds (IMPORT FAILED): /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-GIMM-VFI
2026-01-06T16:15:17.606202 -    0.1 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-Crystools
2026-01-06T16:15:17.606253 -    0.1 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-Manager
2026-01-06T16:15:17.606304 -    0.2 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanAnimatePreprocess
2026-01-06T16:15:17.606355 -    0.7 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper
2026-01-06T16:15:17.606408 -    3.9 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-easy-use
2026-01-06T16:15:17.606460 - 
2026-01-06T16:15:17.637704 - [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json
2026-01-06T16:15:17.888252 - Context impl SQLiteImpl.
2026-01-06T16:15:17.888377 - Will assume non-transactional DDL.
2026-01-06T16:15:17.889351 - No target revision found.
2026-01-06T16:15:18.024939 - Starting server

2026-01-06T16:15:18.025211 - To see the GUI go to: http://127.0.0.1:8188
2026-01-06T16:15:18.456509 - [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json
2026-01-06T16:15:18.572711 - [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/github-stats.json
2026-01-06T16:15:18.902674 - [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json
2026-01-06T16:15:28.103106 - FETCH ComfyRegistry Data: 5/1172026-01-06T16:15:28.103180 - 
2026-01-06T16:15:37.363753 - FETCH ComfyRegistry Data: 10/1172026-01-06T16:15:37.363821 - 
2026-01-06T16:15:46.887170 - FETCH ComfyRegistry Data: 15/1172026-01-06T16:15:46.887256 - 
2026-01-06T16:15:49.326948 - got prompt
2026-01-06T16:15:53.278930 - 
T5Encoder:   4%|█▍                               | 1/24 [00:01<00:23,  1.01s/it]2026-01-06T16:15:53.301711 - 
T5Encoder: 100%|████████████████████████████████| 24/24 [00:01<00:00, 23.20it/s]2026-01-06T16:15:53.301773 - 
2026-01-06T16:15:53.301926 - prompt token count:2026-01-06T16:15:53.301975 -  2026-01-06T16:15:53.425458 - tensor([246], device='cuda:0')2026-01-06T16:15:53.425528 - 
2026-01-06T16:15:53.426979 - 
T5Encoder:   0%|                                         | 0/24 [00:00<?, ?it/s]2026-01-06T16:15:53.454267 - 
T5Encoder: 100%|███████████████████████████████| 24/24 [00:00<00:00, 884.55it/s]2026-01-06T16:15:53.454328 - 
2026-01-06T16:15:53.454494 - prompt token count:2026-01-06T16:15:53.454543 -  2026-01-06T16:15:53.578413 - tensor([44], device='cuda:0')2026-01-06T16:15:53.578482 - 
2026-01-06T16:15:57.442469 - 
WanVAE encoding frames:  19%|███▊                | 4/21 [00:01<00:09,  1.87it/s]2026-01-06T16:15:57.869437 - FETCH ComfyRegistry Data: 20/1172026-01-06T16:15:57.869501 - 
2026-01-06T16:16:07.025178 - 
WanVAE encoding frames: 100%|███████████████████| 21/21 [00:11<00:00,  1.78it/s]2026-01-06T16:16:07.025356 - 
WanVAE encoding frames: 100%|███████████████████| 21/21 [00:11<00:00,  1.83it/s]2026-01-06T16:16:07.025411 - 
2026-01-06T16:16:07.046215 - WanVAE encoded input:torch.Size([1, 3, 81, 896, 512]) to torch.Size([1, 32, 21, 112, 64])
2026-01-06T16:16:07.047606 - [WanVAE encode] Allocated memory: memory=0.764 GB
2026-01-06T16:16:07.047727 - [WanVAE encode] Max allocated memory: max_memory=12.483 GB
2026-01-06T16:16:07.047838 - [WanVAE encode] Max reserved memory: max_reserved=15.357 GB
2026-01-06T16:16:07.073083 - FETCH ComfyRegistry Data: 25/1172026-01-06T16:16:07.073167 - 
2026-01-06T16:16:07.384869 - CUDA Compute Capability: 12.0
2026-01-06T16:16:07.385468 - Detected model in_channels: 36
2026-01-06T16:16:07.385576 - Model cross attention type: t2v, num_heads: 40, num_layers: 40
2026-01-06T16:16:07.385715 - Model variant detected: i2v_14B_2.2
2026-01-06T16:16:07.449057 - model_type FLOW
2026-01-06T16:16:07.466943 - Loading LoRA: wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise with strength: 1.0
2026-01-06T16:16:07.577072 - Using accelerate to load and assign model weights to device...
2026-01-06T16:16:07.727471 - 
Loading transformer parameters to cuda:0:   1%| | 11/1095 [00:00<00:14, 73.43it/2026-01-06T16:16:07.758470 - 
Loading transformer parameters to cuda:0: 100%|█| 1095/1095 [00:00<00:00, 6053.22026-01-06T16:16:07.758527 - 
2026-01-06T16:16:07.758641 - Using 400 LoRA weight patches for WanVideo model
2026-01-06T16:16:07.898659 - ------- Scheduler info -------
2026-01-06T16:16:07.899205 - Total timesteps: tensor([1000,  937,  833,  625], device='cuda:0')
2026-01-06T16:16:07.899459 - Using timesteps: tensor([1000,  937], device='cuda:0')
2026-01-06T16:16:08.153706 - Using sigmas: tensor([1.0000, 0.9375, 0.8333], device='cuda:0')
2026-01-06T16:16:08.153948 - ------------------------------
2026-01-06T16:16:08.154791 - sigmas: tensor([1.0000, 0.9375, 0.8333], device='cuda:0')
2026-01-06T16:16:08.156559 - image_cond shape: torch.Size([20, 21, 112, 64])
2026-01-06T16:16:08.164334 - Number of prompts: 1
2026-01-06T16:16:08.164463 - Section size: 21.0
2026-01-06T16:16:08.164555 - context window seq len: 37632
2026-01-06T16:16:08.164626 - Context schedule enabled: 21 frames, 1 stride, 1 overlap
2026-01-06T16:16:08.422435 - TeaCache: Using cache device: cpu
2026-01-06T16:16:08.423067 - Radial attention mode enabled.
2026-01-06T16:16:08.423192 - dense_attention_mode: sageattn, dense_timesteps: 1, decay_factor: 0.2
2026-01-06T16:16:08.423274 - dense_blocks: [0])
2026-01-06T16:16:08.423487 - Rope function: comfy
2026-01-06T16:16:08.424357 - Input sequence length: 37632
2026-01-06T16:16:08.424449 - Sampling 81 frames at 512x896 with 2 steps
2026-01-06T16:16:08.679019 - 
  0%|                                                     | 0/2 [00:00<?, ?it/s]2026-01-06T16:16:08.778429 - Generated new RoPE frequencies
2026-01-06T16:16:17.107405 - FETCH ComfyRegistry Data: 30/1172026-01-06T16:16:17.107474 - 
2026-01-06T16:16:28.151897 - FETCH ComfyRegistry Data: 35/1172026-01-06T16:16:28.151989 - 
2026-01-06T16:16:43.110141 - FETCH ComfyRegistry Data: 40/1172026-01-06T16:16:43.110206 - 
2026-01-06T16:16:53.074928 - FETCH ComfyRegistry Data: 45/1172026-01-06T16:16:53.075010 - 
2026-01-06T16:16:56.201486 - 
 50%|██████████████████████▌                      | 1/2 [00:47<00:47, 47.52s/it]2026-01-06T16:16:57.407779 - Radial Attention: Generating block mask2026-01-06T16:16:57.407848 - 
2026-01-06T16:16:57.408105 - 
2026-01-06T16:16:57.408256 - 
Frames (i):   0%|                                        | 0/21 [00:00<?, ?it/s]2026-01-06T16:16:57.408334 - �[A2026-01-06T16:16:57.510494 - 
2026-01-06T16:16:57.510611 - 
Frames (i):  52%|███████████████▋              | 11/21 [00:00<00:00, 107.89it/s]2026-01-06T16:16:57.510671 - �[A2026-01-06T16:16:57.559466 - 
Frames (i): 100%|██████████████████████████████| 21/21 [00:00<00:00, 139.13it/s]2026-01-06T16:16:57.559529 - 
2026-01-06T16:16:57.560260 - Error during model prediction: block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.
2026-01-06T16:16:57.906480 - 
 50%|██████████████████████▌                      | 1/2 [00:49<00:49, 49.23s/it]2026-01-06T16:16:57.906908 - 
2026-01-06T16:16:57.907036 - Error during sampling: block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.
2026-01-06T16:16:58.200813 - !!! Exception during processing !!! block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.
2026-01-06T16:16:58.211418 - Traceback (most recent call last):
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 516, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 330, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 304, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 292, in process_inputs
    result = f(**inputs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 3251, in process
    raise e
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 2237, in process
    noise_pred_context, _, new_teacache = predict_with_cfg(
                                          ~~~~~~~~~~~~~~~~^
        partial_latent_model_input,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<6 lines>...
        wananim_face_pixels=partial_wananim_face_pixels, wananim_pose_latents=partial_wananim_pose_latents, multitalk_audio_embeds=multitalk_audio_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        uni3c_data=uni3c_data, flashvsr_LQ_latent=partial_flashvsr_LQ_latent)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 1755, in predict_with_cfg
    raise e
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 1602, in predict_with_cfg
    noise_pred_cond, noise_pred_ovi, cache_state_cond = transformer(
                                                        ~~~~~~~~~~~^
        context=positive_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        **base_params
        ^^^^^^^^^^^^^
    )
    ^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1780, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1791, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 3173, in forward
    x, x_ip, lynx_ref_feature, x_ovi = block(x, x_ip=x_ip, lynx_ref_feature=lynx_ref_feature, x_ovi=x_ovi, x_onetoall_ref=x_onetoall_ref, onetoall_freqs=onetoall_freqs, **kwargs)
                                       ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1780, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1791, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 1171, in forward
    y = self.self_attn.forward_radial(q, k, v, dense_step=False)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 516, in forward_radial
    x = RadialSpargeSageAttn(q, k, v, self.mask_map, decay_factor=self.decay_factor)
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 1209, in _fn
    return fn(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/radial_attention/attn_mask.py", line 168, in RadialSpargeSageAttn
    return sparse_attn_func(
           ~~~~~~~~~~~~~~~~^
        query[:, :, :mask_map.video_token_num, :],
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
        tensor_layout="NHD"
        ^^^^^^^^^^^^^^^^^^^
    ).contiguous()
    ^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 1209, in _fn
    return fn(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/spas_sage_attn/core.py", line 273, in block_sparse_sage2_attn_cuda
    raise RuntimeError(
    ...<2 lines>...
    )
RuntimeError: block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.

2026-01-06T16:16:58.219619 - Prompt executed in 68.88 seconds
2026-01-06T16:17:03.815249 - FETCH ComfyRegistry Data: 50/1172026-01-06T16:17:03.815326 - 
2026-01-06T16:17:13.116115 - FETCH ComfyRegistry Data: 55/1172026-01-06T16:17:13.116182 - 
2026-01-06T16:17:23.614580 - FETCH ComfyRegistry Data: 60/1172026-01-06T16:17:23.614650 - 
2026-01-06T16:17:33.446622 - FETCH ComfyRegistry Data: 65/1172026-01-06T16:17:33.446694 - 
2026-01-06T16:17:44.334776 - FETCH ComfyRegistry Data: 70/1172026-01-06T16:17:44.334874 - 
2026-01-06T16:17:53.855379 - FETCH ComfyRegistry Data: 75/1172026-01-06T16:17:53.855450 - 
2026-01-06T16:18:05.232362 - FETCH ComfyRegistry Data: 80/1172026-01-06T16:18:05.232454 - 
2026-01-06T16:18:14.906355 - FETCH ComfyRegistry Data: 85/1172026-01-06T16:18:14.906435 - 
2026-01-06T16:18:24.707934 - FETCH ComfyRegistry Data: 90/1172026-01-06T16:18:24.708002 - 
2026-01-06T16:18:35.798195 - FETCH ComfyRegistry Data: 95/1172026-01-06T16:18:35.798266 - 
2026-01-06T16:18:45.058420 - FETCH ComfyRegistry Data: 100/1172026-01-06T16:18:45.058489 - 
2026-01-06T16:18:54.404433 - FETCH ComfyRegistry Data: 105/1172026-01-06T16:18:54.404498 - 
2026-01-06T16:19:03.967834 - FETCH ComfyRegistry Data: 110/1172026-01-06T16:19:03.967930 - 
2026-01-06T16:19:13.379911 - FETCH ComfyRegistry Data: 115/1172026-01-06T16:19:13.380002 - 
2026-01-06T16:19:18.843197 - FETCH ComfyRegistry Data [DONE]2026-01-06T16:19:18.843284 - 
2026-01-06T16:19:18.987055 - [ComfyUI-Manager] default cache updated: https://api.comfy.org/nodes
2026-01-06T16:19:19.008230 - FETCH DATA from: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json2026-01-06T16:19:19.008252 - 2026-01-06T16:19:21.484115 -  [DONE]2026-01-06T16:19:21.484188 - 
2026-01-06T16:19:21.536130 - [ComfyUI-Manager] All startup tasks have been completed.
2026-01-06T16:22:26.971167 - got prompt
2026-01-06T16:22:26.992706 - Using accelerate to load and assign model weights to device...
2026-01-06T16:22:26.993040 - 
Loading transformer parameters to cuda:0:   0%|        | 0/1095 [00:00<?, ?it/s]2026-01-06T16:22:27.049217 - 
Loading transformer parameters to cuda:0: 100%|█| 1095/1095 [00:00<00:00, 19542.2026-01-06T16:22:27.049270 - 
2026-01-06T16:22:27.049361 - Using 400 LoRA weight patches for WanVideo model
2026-01-06T16:22:27.188776 - ------- Scheduler info -------
2026-01-06T16:22:27.189422 - Total timesteps: tensor([1000,  937,  833,  625], device='cuda:0')
2026-01-06T16:22:27.189759 - Using timesteps: tensor([1000,  937], device='cuda:0')
2026-01-06T16:22:27.190605 - Using sigmas: tensor([1.0000, 0.9375, 0.8333], device='cuda:0')
2026-01-06T16:22:27.190689 - ------------------------------
2026-01-06T16:22:27.191375 - sigmas: tensor([1.0000, 0.9375, 0.8333], device='cuda:0')
2026-01-06T16:22:27.193144 - image_cond shape: torch.Size([20, 21, 112, 64])
2026-01-06T16:22:27.199926 - Number of prompts: 1
2026-01-06T16:22:27.200037 - Section size: 21.0
2026-01-06T16:22:27.200106 - context window seq len: 37632
2026-01-06T16:22:27.200167 - Context schedule enabled: 21 frames, 1 stride, 1 overlap
2026-01-06T16:22:27.476490 - TeaCache: Using cache device: cpu
2026-01-06T16:22:27.477180 - Radial attention mode enabled.
2026-01-06T16:22:27.477252 - dense_attention_mode: sparse_sage_attention, dense_timesteps: 0, decay_factor: 0.2
2026-01-06T16:22:27.477342 - dense_blocks: [0, 1])
2026-01-06T16:22:27.477426 - Rope function: comfy
2026-01-06T16:22:27.477591 - Input sequence length: 37632
2026-01-06T16:22:27.477677 - Sampling 81 frames at 512x896 with 2 steps
2026-01-06T16:22:27.751259 - 
  0%|                                                     | 0/2 [00:00<?, ?it/s]2026-01-06T16:22:29.963418 - Error during model prediction: block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.
2026-01-06T16:22:30.313092 - 
  0%|                                                     | 0/2 [00:02<?, ?it/s]2026-01-06T16:22:30.313156 - 
2026-01-06T16:22:30.313278 - Error during sampling: block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.
2026-01-06T16:22:30.617677 - !!! Exception during processing !!! block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.
2026-01-06T16:22:30.625946 - Traceback (most recent call last):
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 516, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 330, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 304, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 292, in process_inputs
    result = f(**inputs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 3251, in process
    raise e
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 2237, in process
    noise_pred_context, _, new_teacache = predict_with_cfg(
                                          ~~~~~~~~~~~~~~~~^
        partial_latent_model_input,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<6 lines>...
        wananim_face_pixels=partial_wananim_face_pixels, wananim_pose_latents=partial_wananim_pose_latents, multitalk_audio_embeds=multitalk_audio_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        uni3c_data=uni3c_data, flashvsr_LQ_latent=partial_flashvsr_LQ_latent)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 1755, in predict_with_cfg
    raise e
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 1602, in predict_with_cfg
    noise_pred_cond, noise_pred_ovi, cache_state_cond = transformer(
                                                        ~~~~~~~~~~~^
        context=positive_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        **base_params
        ^^^^^^^^^^^^^
    )
    ^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1780, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1791, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 3173, in forward
    x, x_ip, lynx_ref_feature, x_ovi = block(x, x_ip=x_ip, lynx_ref_feature=lynx_ref_feature, x_ovi=x_ovi, x_onetoall_ref=x_onetoall_ref, onetoall_freqs=onetoall_freqs, **kwargs)
                                       ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1780, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1791, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 1171, in forward
    y = self.self_attn.forward_radial(q, k, v, dense_step=False)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 516, in forward_radial
    x = RadialSpargeSageAttn(q, k, v, self.mask_map, decay_factor=self.decay_factor)
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 1209, in _fn
    return fn(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/radial_attention/attn_mask.py", line 168, in RadialSpargeSageAttn
    return sparse_attn_func(
           ~~~~~~~~~~~~~~~~^
        query[:, :, :mask_map.video_token_num, :],
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
        tensor_layout="NHD"
        ^^^^^^^^^^^^^^^^^^^
    ).contiguous()
    ^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 1209, in _fn
    return fn(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/spas_sage_attn/core.py", line 273, in block_sparse_sage2_attn_cuda
    raise RuntimeError(
    ...<2 lines>...
    )
RuntimeError: block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.

2026-01-06T16:22:30.633981 - Prompt executed in 3.65 seconds

Attached Workflow

Please make sure that workflow does not contain any sensitive information such as API keys or passwords.

Workflow too large. Please manually upload the workflow from local file system.

Additional Context

(Please add any additional context or steps to reproduce the error here)

@0xDELUXA
Copy link

0xDELUXA commented Jan 6, 2026

Hello. I have successfully compiled the SpargeAttn AMD you provided on Linux. My graphics card is 9070xt.

We can't yet use Sparge on RDNA4, I mean we can, but it will be bad (based on my cosine result). Need to wait for Jam to optimize this PR for RDNA4.

@ouco1986
Copy link

ouco1986 commented Jan 6, 2026

Hello. I have successfully compiled the SpargeAttn AMD you provided on Linux. My graphics card is 9070xt.

We can't yet use Sparge on RDNA4, I mean we can, but it will be bad (based on my cosine result). Need to wait for Jam to optimize this PR for RDNA4.

in truth. However, Comfyui can run on Windows and is faster than Sageatt, although the image quality has decreased.

@githust66
Copy link

Hello. I have successfully compiled the SpargeAttn AMD you provided on Linux. My graphics card is 9070xt. After selecting SpargeAttn for the workflow under Comfyui and selecting sparse_stage as the parameter, the K-sampling error is as follows. It can also run the same workflow with the same configuration on Windows. I don't know if it can solve this problem. Thank @jammm

ComfyUI Error Report

Error Details

  • Node ID: 5
  • Node Type: WanVideoSampler
  • Exception Type: RuntimeError
  • Exception Message: block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.

Stack Trace

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 516, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 330, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 304, in _async_map_node_over_list
    await process_inputs(input_dict, i)

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 292, in process_inputs
    result = f(**inputs)

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 3251, in process
    raise e

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 2237, in process
    noise_pred_context, _, new_teacache = predict_with_cfg(
                                          ~~~~~~~~~~~~~~~~^
        partial_latent_model_input,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<6 lines>...
        wananim_face_pixels=partial_wananim_face_pixels, wananim_pose_latents=partial_wananim_pose_latents, multitalk_audio_embeds=multitalk_audio_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        uni3c_data=uni3c_data, flashvsr_LQ_latent=partial_flashvsr_LQ_latent)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 1755, in predict_with_cfg
    raise e

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 1602, in predict_with_cfg
    noise_pred_cond, noise_pred_ovi, cache_state_cond = transformer(
                                                        ~~~~~~~~~~~^
        context=positive_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        **base_params
        ^^^^^^^^^^^^^
    )
    ^

  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1780, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1791, in _call_impl
    return forward_call(*args, **kwargs)

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 3173, in forward
    x, x_ip, lynx_ref_feature, x_ovi = block(x, x_ip=x_ip, lynx_ref_feature=lynx_ref_feature, x_ovi=x_ovi, x_onetoall_ref=x_onetoall_ref, onetoall_freqs=onetoall_freqs, **kwargs)
                                       ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1780, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1791, in _call_impl
    return forward_call(*args, **kwargs)

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 1171, in forward
    y = self.self_attn.forward_radial(q, k, v, dense_step=False)

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 516, in forward_radial
    x = RadialSpargeSageAttn(q, k, v, self.mask_map, decay_factor=self.decay_factor)

  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 1209, in _fn
    return fn(*args, **kwargs)

  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/radial_attention/attn_mask.py", line 168, in RadialSpargeSageAttn
    return sparse_attn_func(
           ~~~~~~~~~~~~~~~~^
        query[:, :, :mask_map.video_token_num, :],
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
        tensor_layout="NHD"
        ^^^^^^^^^^^^^^^^^^^
    ).contiguous()
    ^

  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 1209, in _fn
    return fn(*args, **kwargs)

  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/spas_sage_attn/core.py", line 273, in block_sparse_sage2_attn_cuda
    raise RuntimeError(
    ...<2 lines>...
    )

System Information

  • ComfyUI Version: 0.6.0
  • Arguments: /media/wr/LTW/linux/ROCm7/ComfyUI/main.py --use-pytorch-cross-attention --fast
  • OS: linux
  • Python Version: 3.13.5 (main, Jun 25 2025, 18:55:22) [GCC 14.2.0]
  • Embedded Python: false
  • PyTorch Version: 2.11.0.dev20251224+rocm7.1

Devices

  • Name: cuda:0 AMD Radeon RX 9070 XT : native

    • Type: cuda
    • VRAM Total: 17095983104
    • VRAM Free: 17026777088
    • Torch VRAM Total: 0
    • Torch VRAM Free: 0

Logs

2026-01-06T16:15:11.015601 - ComfyUI frontend version: 1.35.9
2026-01-06T16:15:11.017276 - [Prompt Server] web root: /media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/comfyui_frontend_package/static
2026-01-06T16:15:11.761535 - Total VRAM 16304 MB, total RAM 80335 MB
2026-01-06T16:15:11.761669 - pytorch version: 2.11.0.dev20251224+rocm7.1
2026-01-06T16:15:11.762603 - Set: torch.backends.cudnn.enabled = False for better AMD performance.
2026-01-06T16:15:11.762686 - AMD arch: gfx1201
2026-01-06T16:15:11.762753 - ROCm version: (7, 1)
2026-01-06T16:15:11.762837 - Enabled fp16 accumulation.
2026-01-06T16:15:11.762922 - Set vram state to: NORMAL_VRAM
2026-01-06T16:15:11.763009 - Device: cuda:0 AMD Radeon RX 9070 XT : native
2026-01-06T16:15:11.763195 - Enabled pinned memory 76318.0
2026-01-06T16:15:12.321069 - # # #
AMD GO FAST
# # #2026-01-06T16:15:12.321127 - 
2026-01-06T16:15:12.343678 - [Crystools �[0;32mINFO�[0m] Crystools version: 1.27.4
2026-01-06T16:15:12.376358 - [Crystools �[0;32mINFO�[0m] Platform release: 6.12.57+deb13-amd64
2026-01-06T16:15:12.376446 - [Crystools �[0;32mINFO�[0m] JETSON: Not detected.
2026-01-06T16:15:12.378675 - [Crystools �[0;32mINFO�[0m] CPU: AMD Ryzen 9 5900X 12-Core Processor - Arch: x86_64 - OS: Linux 6.12.57+deb13-amd64
2026-01-06T16:15:12.382797 - /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-Crystools/general/gpu.py:67: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml
2026-01-06T16:15:12.391348 - [Crystools �[0;31mERROR�[0m] Could not init pynvml (NVIDIA). NVML Shared Library Not Found
2026-01-06T16:15:12.391520 - [Crystools �[0;33mWARNING�[0m] No GPU monitoring libraries available.
2026-01-06T16:15:16.282651 - �[34m[ComfyUI-Easy-Use] server: �[0mv1.3.4 �[92mLoaded�[0m2026-01-06T16:15:16.282705 - 
2026-01-06T16:15:16.282747 - �[34m[ComfyUI-Easy-Use] web root: �[0m/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-easy-use/web_version/v2 �[92mLoaded�[0m2026-01-06T16:15:16.282785 - 
2026-01-06T16:15:16.297184 - ComfyUI-GGUF: Allowing full torch compile
2026-01-06T16:15:16.365041 - Traceback (most recent call last):
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/nodes.py", line 2149, in load_custom_node
    module_spec.loader.exec_module(module)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "<frozen importlib._bootstrap_external>", line 1026, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-GIMM-VFI/__init__.py", line 1, in <module>
    from .nodes import NODE_CLASS_MAPPINGS, NODE_DISPLAY_NAME_MAPPINGS
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-GIMM-VFI/nodes.py", line 13, in <module>
    from .gimmvfi.generalizable_INR.gimmvfi_r import GIMMVFI_R
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-GIMM-VFI/gimmvfi/generalizable_INR/gimmvfi_r.py", line 31, in <module>
    from .modules.softsplat import softsplat
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-GIMM-VFI/gimmvfi/generalizable_INR/modules/softsplat.py", line 12, in <module>
    import cupy
ModuleNotFoundError: No module named 'cupy'

2026-01-06T16:15:16.365259 - Cannot import /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-GIMM-VFI module for custom nodes: No module named 'cupy'
2026-01-06T16:15:16.396261 - ### Loading: ComfyUI-Manager (V3.37.1)
2026-01-06T16:15:16.396742 - [ComfyUI-Manager] network_mode: public
2026-01-06T16:15:16.447423 - ### ComfyUI Version: v0.6.0-3-g532e2850 | Released on '2025-12-24'
2026-01-06T16:15:16.499003 - Traceback (most recent call last):
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/nodes.py", line 2149, in load_custom_node
    module_spec.loader.exec_module(module)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "<frozen importlib._bootstrap_external>", line 1026, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-tensorops/__init__.py", line 1, in <module>
    from .nodes import NODE_CLASS_MAPPINGS, NODE_DISPLAY_NAME_MAPPINGS
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-tensorops/nodes/__init__.py", line 3, in <module>
    from .save_surreal import SaveJsonToSurreal, SaveTextToSurreal
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-tensorops/nodes/save_surreal.py", line 1, in <module>
    from .surreal import surreal_connect
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-tensorops/nodes/surreal.py", line 1, in <module>
    from surrealist import Surreal
ModuleNotFoundError: No module named 'surrealist'

2026-01-06T16:15:16.499158 - Cannot import /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-tensorops module for custom nodes: No module named 'surrealist'
2026-01-06T16:15:17.000042 - Warning: Sage2++ NOT enabled2026-01-06T16:15:17.000104 - 
2026-01-06T16:15:17.450179 - �[36;20m[/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui_controlnet_aux] | INFO -> Using ckpts path: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui_controlnet_aux/ckpts�[0m
2026-01-06T16:15:17.450645 - �[36;20m[/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui_controlnet_aux] | INFO -> Using symlinks: False�[0m
2026-01-06T16:15:17.451091 - �[36;20m[/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui_controlnet_aux] | INFO -> Using ort providers: ['CUDAExecutionProvider', 'DirectMLExecutionProvider', 'OpenVINOExecutionProvider', 'ROCMExecutionProvider', 'MIGraphXExecutionProvider', 'CPUExecutionProvider', 'CoreMLExecutionProvider']�[0m
2026-01-06T16:15:17.486896 - [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json
2026-01-06T16:15:17.488448 - /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui_controlnet_aux/node_wrappers/dwpose.py:26: UserWarning: DWPose: Onnxruntime not found or doesn't come with acceleration providers, switch to OpenCV with CPU device. DWPose might run very slowly
  warnings.warn("DWPose: Onnxruntime not found or doesn't come with acceleration providers, switch to OpenCV with CPU device. DWPose might run very slowly")
2026-01-06T16:15:17.552620 - 
============================================================2026-01-06T16:15:17.552686 - 
2026-01-06T16:15:17.552740 - ERROR: Could not import vendored TurboDiffusion code!2026-01-06T16:15:17.552788 - 
2026-01-06T16:15:17.552839 - ============================================================2026-01-06T16:15:17.552888 - 
2026-01-06T16:15:17.552938 - Import error: No module named 'loguru'2026-01-06T16:15:17.552990 - 
2026-01-06T16:15:17.553040 - 
This should not happen as TurboDiffusion code is vendored in the package.2026-01-06T16:15:17.553093 - 
2026-01-06T16:15:17.553140 - Please report this issue at: https://github.com/anveshane/Comfyui_turbodiffusion/issues2026-01-06T16:15:17.553185 - 
2026-01-06T16:15:17.553232 - ============================================================
2026-01-06T16:15:17.553278 - 
2026-01-06T16:15:17.560090 - ERROR: Could not import TurboDiffusion modules: No module named 'loguru'2026-01-06T16:15:17.560138 - 
2026-01-06T16:15:17.563094 - ERROR: Could not import Wan2pt1VAEInterface: No module named 'loguru'2026-01-06T16:15:17.563140 - 
2026-01-06T16:15:17.563224 - 
============================================================2026-01-06T16:15:17.563267 - 
2026-01-06T16:15:17.563307 - ComfyUI TurboDiffusion I2V Node2026-01-06T16:15:17.563347 - 
2026-01-06T16:15:17.563389 - ============================================================2026-01-06T16:15:17.563428 - 
2026-01-06T16:15:17.563467 - Version: 0.1.02026-01-06T16:15:17.563506 - 
2026-01-06T16:15:17.563546 - Loaded 5 nodes:2026-01-06T16:15:17.563584 - 
2026-01-06T16:15:17.563624 -   - TurboWan I2V Sampler (TurboWanSampler)2026-01-06T16:15:17.563663 - 
2026-01-06T16:15:17.563716 -   - Save Video (TurboDiffusionSaveVideo)2026-01-06T16:15:17.563756 - 
2026-01-06T16:15:17.563796 -   - TurboWan Model Loader (Quantized) (TurboWanModelLoader)2026-01-06T16:15:17.563835 - 
2026-01-06T16:15:17.563875 -   - TurboDiffusion I2V Sampler (TurboDiffusionI2VSampler)2026-01-06T16:15:17.563915 - 
2026-01-06T16:15:17.563957 -   - TurboWan VAE Loader (TurboWanVAELoader)2026-01-06T16:15:17.563996 - 
2026-01-06T16:15:17.564035 - 
Features:2026-01-06T16:15:17.564073 - 
2026-01-06T16:15:17.564111 -   - TurboWan Model Loader: Official TurboDiffusion model loading2026-01-06T16:15:17.564149 - 
2026-01-06T16:15:17.564187 -   - Supports int8 block-wise quantized .pth models2026-01-06T16:15:17.564228 - 
2026-01-06T16:15:17.564267 -   - SageSLA/SLA attention optimization for faster inference2026-01-06T16:15:17.564305 - 
2026-01-06T16:15:17.564343 -   - Attention top-k tuning (0.01-1.0)2026-01-06T16:15:17.564380 - 
2026-01-06T16:15:17.564418 - 
Requires:2026-01-06T16:15:17.564456 - 
2026-01-06T16:15:17.564494 -   - TurboDiffusion Python package (manual install)2026-01-06T16:15:17.564532 - 
2026-01-06T16:15:17.564570 -   - Quantized .pth models from HuggingFace2026-01-06T16:15:17.564607 - 
2026-01-06T16:15:17.564646 - ============================================================
2026-01-06T16:15:17.564684 - 
2026-01-06T16:15:17.589803 - 
2026-01-06T16:15:17.589859 - �[92m[rgthree-comfy] Loaded 48 exciting nodes. 🎉�[0m2026-01-06T16:15:17.589900 - 
2026-01-06T16:15:17.589938 - 
2026-01-06T16:15:17.589987 - �[33m[rgthree-comfy] ComfyUI's new Node 2.0 rendering may be incompatible with some rgthree-comfy nodes and features, breaking some rendering as well as losing the ability to access a node's properties (a vital part of many nodes). It also appears to run MUCH more slowly spiking CPU usage and causing jankiness and unresponsiveness, especially with large workflows. Personally I am not planning to use the new Nodes 2.0 and, unfortunately, am not able to invest the time to investigate and overhaul rgthree-comfy where needed. If you have issues when Nodes 2.0 is enabled, I'd urge you to switch it off as well and join me in hoping ComfyUI is not planning to deprecate the existing, stable canvas rendering all together.
�[0m2026-01-06T16:15:17.590028 - 
2026-01-06T16:15:17.601844 - [INFO] ComfyUI-GGUF not found, using our implementation2026-01-06T16:15:17.601902 - 
2026-01-06T16:15:17.602019 - [ROCm Ninodes] Successfully loaded from rocm_nodes package2026-01-06T16:15:17.602071 - 
2026-01-06T16:15:17.604813 - �[34m[通用块交换]�[0m: 节点 '通用模型 块交换' 已成功加载。2026-01-06T16:15:17.604858 - 
2026-01-06T16:15:17.605093 - 
Import times for custom nodes:
2026-01-06T16:15:17.605168 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/websocket_image_save.py
2026-01-06T16:15:17.605227 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/cfz_vae_loader.py
2026-01-06T16:15:17.605282 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/cfz_cudnn.toggle.py
2026-01-06T16:15:17.605336 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/cfz_patcher.py
2026-01-06T16:15:17.605391 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/通用交换块(应该)
2026-01-06T16:15:17.605446 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-amd-go-fast
2026-01-06T16:15:17.605499 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-DD-Translation
2026-01-06T16:15:17.605578 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-GGUF
2026-01-06T16:15:17.605631 -    0.0 seconds (IMPORT FAILED): /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-tensorops
2026-01-06T16:15:17.605686 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-custom-scripts
2026-01-06T16:15:17.605738 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui_essentials
2026-01-06T16:15:17.605790 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/rocm-ninodes
2026-01-06T16:15:17.605845 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite
2026-01-06T16:15:17.605897 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-segment-anything-2
2026-01-06T16:15:17.605948 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-KJNodes
2026-01-06T16:15:17.605998 -    0.0 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/rgthree-comfy
2026-01-06T16:15:17.606049 -    0.1 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/Comfyui_turbodiffusion
2026-01-06T16:15:17.606100 -    0.1 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui_controlnet_aux
2026-01-06T16:15:17.606150 -    0.1 seconds (IMPORT FAILED): /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-GIMM-VFI
2026-01-06T16:15:17.606202 -    0.1 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-Crystools
2026-01-06T16:15:17.606253 -    0.1 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-Manager
2026-01-06T16:15:17.606304 -    0.2 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanAnimatePreprocess
2026-01-06T16:15:17.606355 -    0.7 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper
2026-01-06T16:15:17.606408 -    3.9 seconds: /media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/comfyui-easy-use
2026-01-06T16:15:17.606460 - 
2026-01-06T16:15:17.637704 - [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json
2026-01-06T16:15:17.888252 - Context impl SQLiteImpl.
2026-01-06T16:15:17.888377 - Will assume non-transactional DDL.
2026-01-06T16:15:17.889351 - No target revision found.
2026-01-06T16:15:18.024939 - Starting server

2026-01-06T16:15:18.025211 - To see the GUI go to: http://127.0.0.1:8188
2026-01-06T16:15:18.456509 - [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json
2026-01-06T16:15:18.572711 - [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/github-stats.json
2026-01-06T16:15:18.902674 - [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json
2026-01-06T16:15:28.103106 - FETCH ComfyRegistry Data: 5/1172026-01-06T16:15:28.103180 - 
2026-01-06T16:15:37.363753 - FETCH ComfyRegistry Data: 10/1172026-01-06T16:15:37.363821 - 
2026-01-06T16:15:46.887170 - FETCH ComfyRegistry Data: 15/1172026-01-06T16:15:46.887256 - 
2026-01-06T16:15:49.326948 - got prompt
2026-01-06T16:15:53.278930 - 
T5Encoder:   4%|█▍                               | 1/24 [00:01<00:23,  1.01s/it]2026-01-06T16:15:53.301711 - 
T5Encoder: 100%|████████████████████████████████| 24/24 [00:01<00:00, 23.20it/s]2026-01-06T16:15:53.301773 - 
2026-01-06T16:15:53.301926 - prompt token count:2026-01-06T16:15:53.301975 -  2026-01-06T16:15:53.425458 - tensor([246], device='cuda:0')2026-01-06T16:15:53.425528 - 
2026-01-06T16:15:53.426979 - 
T5Encoder:   0%|                                         | 0/24 [00:00<?, ?it/s]2026-01-06T16:15:53.454267 - 
T5Encoder: 100%|███████████████████████████████| 24/24 [00:00<00:00, 884.55it/s]2026-01-06T16:15:53.454328 - 
2026-01-06T16:15:53.454494 - prompt token count:2026-01-06T16:15:53.454543 -  2026-01-06T16:15:53.578413 - tensor([44], device='cuda:0')2026-01-06T16:15:53.578482 - 
2026-01-06T16:15:57.442469 - 
WanVAE encoding frames:  19%|███▊                | 4/21 [00:01<00:09,  1.87it/s]2026-01-06T16:15:57.869437 - FETCH ComfyRegistry Data: 20/1172026-01-06T16:15:57.869501 - 
2026-01-06T16:16:07.025178 - 
WanVAE encoding frames: 100%|███████████████████| 21/21 [00:11<00:00,  1.78it/s]2026-01-06T16:16:07.025356 - 
WanVAE encoding frames: 100%|███████████████████| 21/21 [00:11<00:00,  1.83it/s]2026-01-06T16:16:07.025411 - 
2026-01-06T16:16:07.046215 - WanVAE encoded input:torch.Size([1, 3, 81, 896, 512]) to torch.Size([1, 32, 21, 112, 64])
2026-01-06T16:16:07.047606 - [WanVAE encode] Allocated memory: memory=0.764 GB
2026-01-06T16:16:07.047727 - [WanVAE encode] Max allocated memory: max_memory=12.483 GB
2026-01-06T16:16:07.047838 - [WanVAE encode] Max reserved memory: max_reserved=15.357 GB
2026-01-06T16:16:07.073083 - FETCH ComfyRegistry Data: 25/1172026-01-06T16:16:07.073167 - 
2026-01-06T16:16:07.384869 - CUDA Compute Capability: 12.0
2026-01-06T16:16:07.385468 - Detected model in_channels: 36
2026-01-06T16:16:07.385576 - Model cross attention type: t2v, num_heads: 40, num_layers: 40
2026-01-06T16:16:07.385715 - Model variant detected: i2v_14B_2.2
2026-01-06T16:16:07.449057 - model_type FLOW
2026-01-06T16:16:07.466943 - Loading LoRA: wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise with strength: 1.0
2026-01-06T16:16:07.577072 - Using accelerate to load and assign model weights to device...
2026-01-06T16:16:07.727471 - 
Loading transformer parameters to cuda:0:   1%| | 11/1095 [00:00<00:14, 73.43it/2026-01-06T16:16:07.758470 - 
Loading transformer parameters to cuda:0: 100%|█| 1095/1095 [00:00<00:00, 6053.22026-01-06T16:16:07.758527 - 
2026-01-06T16:16:07.758641 - Using 400 LoRA weight patches for WanVideo model
2026-01-06T16:16:07.898659 - ------- Scheduler info -------
2026-01-06T16:16:07.899205 - Total timesteps: tensor([1000,  937,  833,  625], device='cuda:0')
2026-01-06T16:16:07.899459 - Using timesteps: tensor([1000,  937], device='cuda:0')
2026-01-06T16:16:08.153706 - Using sigmas: tensor([1.0000, 0.9375, 0.8333], device='cuda:0')
2026-01-06T16:16:08.153948 - ------------------------------
2026-01-06T16:16:08.154791 - sigmas: tensor([1.0000, 0.9375, 0.8333], device='cuda:0')
2026-01-06T16:16:08.156559 - image_cond shape: torch.Size([20, 21, 112, 64])
2026-01-06T16:16:08.164334 - Number of prompts: 1
2026-01-06T16:16:08.164463 - Section size: 21.0
2026-01-06T16:16:08.164555 - context window seq len: 37632
2026-01-06T16:16:08.164626 - Context schedule enabled: 21 frames, 1 stride, 1 overlap
2026-01-06T16:16:08.422435 - TeaCache: Using cache device: cpu
2026-01-06T16:16:08.423067 - Radial attention mode enabled.
2026-01-06T16:16:08.423192 - dense_attention_mode: sageattn, dense_timesteps: 1, decay_factor: 0.2
2026-01-06T16:16:08.423274 - dense_blocks: [0])
2026-01-06T16:16:08.423487 - Rope function: comfy
2026-01-06T16:16:08.424357 - Input sequence length: 37632
2026-01-06T16:16:08.424449 - Sampling 81 frames at 512x896 with 2 steps
2026-01-06T16:16:08.679019 - 
  0%|                                                     | 0/2 [00:00<?, ?it/s]2026-01-06T16:16:08.778429 - Generated new RoPE frequencies
2026-01-06T16:16:17.107405 - FETCH ComfyRegistry Data: 30/1172026-01-06T16:16:17.107474 - 
2026-01-06T16:16:28.151897 - FETCH ComfyRegistry Data: 35/1172026-01-06T16:16:28.151989 - 
2026-01-06T16:16:43.110141 - FETCH ComfyRegistry Data: 40/1172026-01-06T16:16:43.110206 - 
2026-01-06T16:16:53.074928 - FETCH ComfyRegistry Data: 45/1172026-01-06T16:16:53.075010 - 
2026-01-06T16:16:56.201486 - 
 50%|██████████████████████▌                      | 1/2 [00:47<00:47, 47.52s/it]2026-01-06T16:16:57.407779 - Radial Attention: Generating block mask2026-01-06T16:16:57.407848 - 
2026-01-06T16:16:57.408105 - 
2026-01-06T16:16:57.408256 - 
Frames (i):   0%|                                        | 0/21 [00:00<?, ?it/s]2026-01-06T16:16:57.408334 - �[A2026-01-06T16:16:57.510494 - 
2026-01-06T16:16:57.510611 - 
Frames (i):  52%|███████████████▋              | 11/21 [00:00<00:00, 107.89it/s]2026-01-06T16:16:57.510671 - �[A2026-01-06T16:16:57.559466 - 
Frames (i): 100%|██████████████████████████████| 21/21 [00:00<00:00, 139.13it/s]2026-01-06T16:16:57.559529 - 
2026-01-06T16:16:57.560260 - Error during model prediction: block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.
2026-01-06T16:16:57.906480 - 
 50%|██████████████████████▌                      | 1/2 [00:49<00:49, 49.23s/it]2026-01-06T16:16:57.906908 - 
2026-01-06T16:16:57.907036 - Error during sampling: block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.
2026-01-06T16:16:58.200813 - !!! Exception during processing !!! block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.
2026-01-06T16:16:58.211418 - Traceback (most recent call last):
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 516, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 330, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 304, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 292, in process_inputs
    result = f(**inputs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 3251, in process
    raise e
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 2237, in process
    noise_pred_context, _, new_teacache = predict_with_cfg(
                                          ~~~~~~~~~~~~~~~~^
        partial_latent_model_input,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<6 lines>...
        wananim_face_pixels=partial_wananim_face_pixels, wananim_pose_latents=partial_wananim_pose_latents, multitalk_audio_embeds=multitalk_audio_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        uni3c_data=uni3c_data, flashvsr_LQ_latent=partial_flashvsr_LQ_latent)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 1755, in predict_with_cfg
    raise e
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 1602, in predict_with_cfg
    noise_pred_cond, noise_pred_ovi, cache_state_cond = transformer(
                                                        ~~~~~~~~~~~^
        context=positive_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        **base_params
        ^^^^^^^^^^^^^
    )
    ^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1780, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1791, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 3173, in forward
    x, x_ip, lynx_ref_feature, x_ovi = block(x, x_ip=x_ip, lynx_ref_feature=lynx_ref_feature, x_ovi=x_ovi, x_onetoall_ref=x_onetoall_ref, onetoall_freqs=onetoall_freqs, **kwargs)
                                       ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1780, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1791, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 1171, in forward
    y = self.self_attn.forward_radial(q, k, v, dense_step=False)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 516, in forward_radial
    x = RadialSpargeSageAttn(q, k, v, self.mask_map, decay_factor=self.decay_factor)
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 1209, in _fn
    return fn(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/radial_attention/attn_mask.py", line 168, in RadialSpargeSageAttn
    return sparse_attn_func(
           ~~~~~~~~~~~~~~~~^
        query[:, :, :mask_map.video_token_num, :],
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
        tensor_layout="NHD"
        ^^^^^^^^^^^^^^^^^^^
    ).contiguous()
    ^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 1209, in _fn
    return fn(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/spas_sage_attn/core.py", line 273, in block_sparse_sage2_attn_cuda
    raise RuntimeError(
    ...<2 lines>...
    )
RuntimeError: block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.

2026-01-06T16:16:58.219619 - Prompt executed in 68.88 seconds
2026-01-06T16:17:03.815249 - FETCH ComfyRegistry Data: 50/1172026-01-06T16:17:03.815326 - 
2026-01-06T16:17:13.116115 - FETCH ComfyRegistry Data: 55/1172026-01-06T16:17:13.116182 - 
2026-01-06T16:17:23.614580 - FETCH ComfyRegistry Data: 60/1172026-01-06T16:17:23.614650 - 
2026-01-06T16:17:33.446622 - FETCH ComfyRegistry Data: 65/1172026-01-06T16:17:33.446694 - 
2026-01-06T16:17:44.334776 - FETCH ComfyRegistry Data: 70/1172026-01-06T16:17:44.334874 - 
2026-01-06T16:17:53.855379 - FETCH ComfyRegistry Data: 75/1172026-01-06T16:17:53.855450 - 
2026-01-06T16:18:05.232362 - FETCH ComfyRegistry Data: 80/1172026-01-06T16:18:05.232454 - 
2026-01-06T16:18:14.906355 - FETCH ComfyRegistry Data: 85/1172026-01-06T16:18:14.906435 - 
2026-01-06T16:18:24.707934 - FETCH ComfyRegistry Data: 90/1172026-01-06T16:18:24.708002 - 
2026-01-06T16:18:35.798195 - FETCH ComfyRegistry Data: 95/1172026-01-06T16:18:35.798266 - 
2026-01-06T16:18:45.058420 - FETCH ComfyRegistry Data: 100/1172026-01-06T16:18:45.058489 - 
2026-01-06T16:18:54.404433 - FETCH ComfyRegistry Data: 105/1172026-01-06T16:18:54.404498 - 
2026-01-06T16:19:03.967834 - FETCH ComfyRegistry Data: 110/1172026-01-06T16:19:03.967930 - 
2026-01-06T16:19:13.379911 - FETCH ComfyRegistry Data: 115/1172026-01-06T16:19:13.380002 - 
2026-01-06T16:19:18.843197 - FETCH ComfyRegistry Data [DONE]2026-01-06T16:19:18.843284 - 
2026-01-06T16:19:18.987055 - [ComfyUI-Manager] default cache updated: https://api.comfy.org/nodes
2026-01-06T16:19:19.008230 - FETCH DATA from: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json2026-01-06T16:19:19.008252 - 2026-01-06T16:19:21.484115 -  [DONE]2026-01-06T16:19:21.484188 - 
2026-01-06T16:19:21.536130 - [ComfyUI-Manager] All startup tasks have been completed.
2026-01-06T16:22:26.971167 - got prompt
2026-01-06T16:22:26.992706 - Using accelerate to load and assign model weights to device...
2026-01-06T16:22:26.993040 - 
Loading transformer parameters to cuda:0:   0%|        | 0/1095 [00:00<?, ?it/s]2026-01-06T16:22:27.049217 - 
Loading transformer parameters to cuda:0: 100%|█| 1095/1095 [00:00<00:00, 19542.2026-01-06T16:22:27.049270 - 
2026-01-06T16:22:27.049361 - Using 400 LoRA weight patches for WanVideo model
2026-01-06T16:22:27.188776 - ------- Scheduler info -------
2026-01-06T16:22:27.189422 - Total timesteps: tensor([1000,  937,  833,  625], device='cuda:0')
2026-01-06T16:22:27.189759 - Using timesteps: tensor([1000,  937], device='cuda:0')
2026-01-06T16:22:27.190605 - Using sigmas: tensor([1.0000, 0.9375, 0.8333], device='cuda:0')
2026-01-06T16:22:27.190689 - ------------------------------
2026-01-06T16:22:27.191375 - sigmas: tensor([1.0000, 0.9375, 0.8333], device='cuda:0')
2026-01-06T16:22:27.193144 - image_cond shape: torch.Size([20, 21, 112, 64])
2026-01-06T16:22:27.199926 - Number of prompts: 1
2026-01-06T16:22:27.200037 - Section size: 21.0
2026-01-06T16:22:27.200106 - context window seq len: 37632
2026-01-06T16:22:27.200167 - Context schedule enabled: 21 frames, 1 stride, 1 overlap
2026-01-06T16:22:27.476490 - TeaCache: Using cache device: cpu
2026-01-06T16:22:27.477180 - Radial attention mode enabled.
2026-01-06T16:22:27.477252 - dense_attention_mode: sparse_sage_attention, dense_timesteps: 0, decay_factor: 0.2
2026-01-06T16:22:27.477342 - dense_blocks: [0, 1])
2026-01-06T16:22:27.477426 - Rope function: comfy
2026-01-06T16:22:27.477591 - Input sequence length: 37632
2026-01-06T16:22:27.477677 - Sampling 81 frames at 512x896 with 2 steps
2026-01-06T16:22:27.751259 - 
  0%|                                                     | 0/2 [00:00<?, ?it/s]2026-01-06T16:22:29.963418 - Error during model prediction: block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.
2026-01-06T16:22:30.313092 - 
  0%|                                                     | 0/2 [00:02<?, ?it/s]2026-01-06T16:22:30.313156 - 
2026-01-06T16:22:30.313278 - Error during sampling: block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.
2026-01-06T16:22:30.617677 - !!! Exception during processing !!! block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.
2026-01-06T16:22:30.625946 - Traceback (most recent call last):
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 516, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 330, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 304, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/execution.py", line 292, in process_inputs
    result = f(**inputs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 3251, in process
    raise e
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 2237, in process
    noise_pred_context, _, new_teacache = predict_with_cfg(
                                          ~~~~~~~~~~~~~~~~^
        partial_latent_model_input,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<6 lines>...
        wananim_face_pixels=partial_wananim_face_pixels, wananim_pose_latents=partial_wananim_pose_latents, multitalk_audio_embeds=multitalk_audio_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        uni3c_data=uni3c_data, flashvsr_LQ_latent=partial_flashvsr_LQ_latent)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 1755, in predict_with_cfg
    raise e
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/nodes_sampler.py", line 1602, in predict_with_cfg
    noise_pred_cond, noise_pred_ovi, cache_state_cond = transformer(
                                                        ~~~~~~~~~~~^
        context=positive_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        **base_params
        ^^^^^^^^^^^^^
    )
    ^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1780, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1791, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 3173, in forward
    x, x_ip, lynx_ref_feature, x_ovi = block(x, x_ip=x_ip, lynx_ref_feature=lynx_ref_feature, x_ovi=x_ovi, x_onetoall_ref=x_onetoall_ref, onetoall_freqs=onetoall_freqs, **kwargs)
                                       ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1780, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1791, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 1171, in forward
    y = self.self_attn.forward_radial(q, k, v, dense_step=False)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/modules/model.py", line 516, in forward_radial
    x = RadialSpargeSageAttn(q, k, v, self.mask_map, decay_factor=self.decay_factor)
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 1209, in _fn
    return fn(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/wanvideo/radial_attention/attn_mask.py", line 168, in RadialSpargeSageAttn
    return sparse_attn_func(
           ~~~~~~~~~~~~~~~~^
        query[:, :, :mask_map.video_token_num, :],
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
        tensor_layout="NHD"
        ^^^^^^^^^^^^^^^^^^^
    ).contiguous()
    ^
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 1209, in _fn
    return fn(*args, **kwargs)
  File "/media/wr/LTW/linux/ROCm7/venvl/lib/python3.13/site-packages/spas_sage_attn/core.py", line 273, in block_sparse_sage2_attn_cuda
    raise RuntimeError(
    ...<2 lines>...
    )
RuntimeError: block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.

2026-01-06T16:22:30.633981 - Prompt executed in 3.65 seconds

Attached Workflow

Please make sure that workflow does not contain any sensitive information such as API keys or passwords.

Workflow too large. Please manually upload the workflow from local file system.

Additional Context

(Please add any additional context or steps to reproduce the error here)

My RX 7900XT is also displaying the same error in Windows.

@0xDELUXA
Copy link

0xDELUXA commented Jan 6, 2026

RuntimeError: block_sparse_sage2_attn_cuda requires FP8 support, but gfx1201 does not support FP8. On RDNA GPUs (gfx10xx/gfx11xx), use spas_sage_attn_meansim_cuda instead.

Indeed, the RX 7900 XT doesn't support FP8, but @ouco1986's RX 9070 XT does. The error specifically says: RDNA GPUs (gfx10xx/gfx11xx), so it doesn't count gfx12xx in it (yet). This is why Jam wrote:

Currently only supports RDNA3/3.5

Also, README_AMD_WINDOWS.md says:

Supported Hardware

SpargeAttn on Windows has been tested with RDNA3/RDNA3.5 GPUs (gfx1100, gfx1101, gfx1102, gfx1103, gfx1151).

So theoretically gfx1100 (RX 7900 XT) should work - in TurboDiffusion, where it was mainly tested.

I think that in the future there might be a conditional in the code, so if the GPU isn’t RDNA4 (or CDNA), it should use the FP16 path; otherwise, FP8.

@jammm
Copy link
Author

jammm commented Jan 6, 2026

Unfortunately I have not worked on getting this running on ComfyUI yet, but It seems like the wrapper needs to be refactored a tiny bit to use the fp16 specific code path instead of the fp8 one. And yes, only RDNA3/3.5 supported as @0xDELUXA rightly pointed out.

@IxMxAMAR
Copy link

IxMxAMAR commented Jan 6, 2026

@githust66 if you have sageattention installed which I think isn't needed either way, just install KJnodes and replace your model_optimization_nodes.py in C:\ComfyUI\custom_nodes\comfyui-kjnodes\nodes with this and then you will have an option to use this in ComfyUI with the Patch sage attention node, just select spas_sage2_attn from the list and put in the topk value you want.
model_optimization_nodes.py

edit: I removed the code for other sage attention patches, but that won't matter since they don't work on AMD eitherway IG

@0xDELUXA
Copy link

0xDELUXA commented Jan 6, 2026

Theoretically, ROCm/rocm-libraries#3579 also affects SpargeAttn with ROCm 7, which could mean it can be even faster than it is now.

@jammm
Copy link
Author

jammm commented Jan 8, 2026

Where can I find this comfyui wrapper for SpargeAttn? Is it on GH?

@0xDELUXA
Copy link

0xDELUXA commented Jan 8, 2026

Where can I find this comfyui wrapper for SpargeAttn? Is it on GH?

Would be good to know, so you could work on getting this running with Comfy, now that AMD support in ComfyUI is official.

@githust66
Copy link

Where can I find this comfyui wrapper for SpargeAttn? Is it on GH?

https://github.com/kijai/ComfyUI-WanVideoWrapper/tree/main/wanvideo/radial_attention

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants