Skip to content

TypeError: flex_flash_attn_func() got an unexpected keyword argument 'max_seqlen_q' #193

@hengrui0516

Description

@hengrui0516

Hi, I just come from Magi-1 repo, when I run a sample test but it fails:

(magi) kanghengrui@x86_64-conda-linux-gnu [/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1] git:(main) ✗ ➜ bash example/4.5B/test_sample.sh [20:50:34]
/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/magi_attention/init.py:26: UserWarning: You are using magi_attention without installing it. This may cause some unexpected errors.
warnings.warn(
/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/timm/models/layers/init.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning)
[W1216 20:50:57.975591094 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
[2025-12-16 20:50:57,458 - INFO] Initialize torch distribution and model parallel successfully
[2025-12-16 20:50:57,458 - INFO] MagiConfig(model_config=ModelConfig(model_name='videodit_ardf', num_layers=34, hidden_size=3072, ffn_hidden_size=12288, num_attention_heads=24, num_query_groups=8, kv_channels=128, layernorm_epsilon=1e-06, apply_layernorm_1p=True, x_rescale_factor=1, half_channel_vae=Fa
lse, params_dtype=torch.bfloat16, patch_size=2, t_patch_size=1, in_channels=16, out_channels=16, cond_hidden_ratio=0.25, caption_channels=4096, caption_max_length=800, xattn_cond_hidden_ratio=1.0, cond_gating_ratio=1.0, gated_linear_unit=False), runtime_config=RuntimeConfig(cfg_number=3, cfg_t_range=[0
.0, 0.0217, 0.1, 0.3, 0.999], prev_chunk_scales=[1.5, 1.5, 1.5, 1.0, 1.0], text_scales=[7.5, 7.5, 7.5, 0.0, 0.0], noise2clean_kvrange=[5, 4, 3, 2], clean_chunk_kvrange=1, clean_t=0.9999, seed=1234, num_frames=96, video_size_h=720, video_size_w=720, num_steps=64, window_size=4, fps=24, chunk_width=6, t5
_pretrained='/mnt/shared-storage-gpfs2/gpfs2-shared-public/huggingface/zskj-hub/models--sand-ai--MAGI-1/ckpt/t5', t5_device='cpu', vae_pretrained='/mnt/shared-storage-gpfs2/gpfs2-shared-public/huggingface/zskj-hub/models--sand-ai--MAGI-1/ckpt/vae', scale_factor=0.18215, temporal_downsample_factor=4, lo
ad='/mnt/shared-storage-gpfs2/gpfs2-shared-public/huggingface/zskj-hub/models--sand-ai--MAGI-1/ckpt/magi/4.5B_base'), engine_config=EngineConfig(distributed_backend='nccl', distributed_timeout_minutes=15, pp_size=1, cp_size=1, cp_strategy='cp_ulysses', ulysses_overlap_degree=1, fp8_quant=False, distill
_nearly_clean_chunk_threshold=0.3, shortcut_mode='8,16,16', distill=False, kv_offload=True, enable_cuda_graph=False))
[2025-12-16 20:50:57,458 - INFO] Precompute validation prompt embeddings
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be
set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
Loading checkpoint shards: 100%|██████████| 2/2 [00:15<00:00, 7.84s/it]
[2025-12-16 20:51:22,098 - INFO] VideoDiTModel(
(x_embedder): Conv3d(16, 3072, kernel_size=(1, 2, 2), stride=(1, 2, 2), bias=False)
(t_embedder): TimestepEmbedder(
(mlp): Sequential(
(0): Linear(in_features=256, out_features=768, bias=True)
(1): SiLU()
(2): Linear(in_features=768, out_features=768, bias=True)
)
)
(y_embedder): CaptionEmbedder(
(y_proj_xattn): Sequential(
(0): Linear(in_features=4096, out_features=3072, bias=True)
(1): SiLU()
)
(y_proj_adaln): Sequential(
(0): Linear(in_features=4096, out_features=768, bias=True)
)
)
(rope): LearnableRotaryEmbeddingCat()
(videodit_blocks): TransformerBlock(
(layers): ModuleList(
(0-33): 34 x TransformerLayer(
(ada_modulate_layer): AdaModulateLayer(
(act): SiLU()
(proj): Sequential(
(0): Linear(in_features=768, out_features=6144, bias=True)
)
)
(self_attention): FullyParallelAttention(
(linear_qkv): CustomLayerNormLinear(
(layer_norm): LayerNorm((3072,), eps=1e-06, elementwise_affine=True)
(q): Linear(in_features=3072, out_features=3072, bias=False)
(qx): Linear(in_features=3072, out_features=3072, bias=False)
(k): Linear(in_features=3072, out_features=1024, bias=False)
(v): Linear(in_features=3072, out_features=1024, bias=False)
)
(linear_kv_xattn): Linear(in_features=3072, out_features=2048, bias=False)
(linear_proj): Linear(in_features=6144, out_features=3072, bias=False)
(q_layernorm): FusedLayerNorm()
(q_layernorm_xattn): FusedLayerNorm()
(k_layernorm): FusedLayerNorm()
(k_layernorm_xattn): FusedLayerNorm()
)
(self_attn_post_norm): FusedLayerNorm()
(mlp): CustomMLP(
(layer_norm): LayerNorm((3072,), eps=1e-06, elementwise_affine=True)
(linear_fc1): Linear(in_features=3072, out_features=12288, bias=False)
(linear_fc2): Linear(in_features=12288, out_features=3072, bias=False)
)
(mlp_post_norm): FusedLayerNorm()
)
)
(final_layernorm): FusedLayerNorm()
)
(final_linear): FinalLinear(
(linear): Linear(in_features=3072, out_features=64, bias=False)
)
)
[2025-12-16 20:51:22,101 - INFO] (cp, pp) rank (0, 0): param count 4459898128, model size 8.34 GB
[2025-12-16 20:51:22,101 - INFO] Build DiTModel successfully
[2025-12-16 20:51:22,102 - INFO] After build_dit_model, memory allocated: 0.01 GB, memory reserved: 0.02 GB
[2025-12-16 20:51:22,102 - INFO] load inference_weight weight from /mnt/shared-storage-gpfs2/gpfs2-shared-public/huggingface/zskj-hub/models--sand-ai--MAGI-1/ckpt/magi/4.5B_base/inference_weight
Loading shards: 100%|██████████| 2/2 [00:00<00:00, 7.70it/s]
[2025-12-16 20:51:30,853 - INFO] Load Weight Missing Keys: []
[2025-12-16 20:51:30,854 - INFO] Load Weight Unexpected Keys: []
[2025-12-16 20:51:31,010 - INFO] After load_checkpoint, memory allocated: 8.36 GB, memory reserved: 8.37 GB
[2025-12-16 20:51:31,013 - INFO] After high_precision_promoter, memory allocated: 8.36 GB, memory reserved: 8.37 GB
[2025-12-16 20:51:31,164 - INFO] Load checkpoint successfully
[2025-12-16 20:51:31,164 - INFO] special_token = ['HQ_TOKEN', 'DURATION_TOKEN']
InferBatch 0: 0%| | 0/4 [00:00<?, ?it/s][2025-12-16 20:51:31,198 - INFO] transport_inputs len: 1
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/entry.py", line 54, in
[rank0]: main()
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/entry.py", line 40, in main
[rank0]: pipeline.run_text_to_video(prompt=args.prompt, output_path=args.output_path)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/pipeline.py", line 35, in run_text_to_video
[rank0]: self._run(prompt, None, output_path)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/pipeline.py", line 49, in _run
[rank0]: [
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/pipeline.py", line 49, in
[rank0]: [
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/video_generate.py", line 763, in generate_per_chunk
[rank0]: for _, _, chunk in sample_transport.walk():
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/video_generate.py", line 725, in walk
[rank0]: velocity = self.forward_velocity(infer_idx, 0)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/video_generate.py", line 657, in forward_velocity
[rank0]: velocity = forward_fn(
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_model.py", line 503, in forward_dispatcher
[rank0]: (out_cond_pre_and_text, out_cond_pre, out_uncond, denoise_width) = self.forward_3cfg(
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_model.py", line 414, in forward_3cfg
[rank0]: out_cond_pre_and_text = self.forward(
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_model.py", line 385, in forward
[rank0]: x = self.videodit_blocks.forward(
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_module.py", line 1427, in forward
[rank0]: hidden_states = layer(
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_module.py", line 1308, in forward
[rank0]: core_attn_out, cross_attn_out = self.self_attention(
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_module.py", line 1195, in forward
[rank0]: core_attn_out, xattn_out = UlyssesScheduler.get_attn_and_xattn_with_fused_kv_comm(
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/infra/parallelism/context_parallel.py", line 525, in get_attn_and_xattn_with_fused_kv_comm
[rank0]: return UlyssesScheduler.get_attn_and_xattn_base(
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/infra/parallelism/context_parallel.py", line 585, in get_attn_and_xattn_base
[rank0]: core_attn_out_new = core_attn_func(query[i], key, value)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_module.py", line 1032, in core_attention
[rank0]: core_attn_out, _ = flex_attention(
[rank0]: TypeError: flex_flash_attn_func() got an unexpected keyword argument 'max_seqlen_q'
InferBatch 0: 0%| | 0/4 [00:00<?, ?it/s]

My environment:
cuda=12.4
torch=2.4.0+cu124
python=3.10.12
magi_attention = 0.0.0
flash_attn=2.7.0.post1

I just reduced FFA_JOBS from 160 to 10 and successfully installed magi_attention=0.0.0

I found if I didn't install magi_attention=0.0.0, this error wouldn't exist. Furthermore, at the beginning this log reminds me that:

UserWarning: You are using magi_attention without installing it. This may cause some unexpected errors.

So I doubt that maybe it is related to magi_attention?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions