Hi, I just come from Magi-1 repo, when I run a sample test but it fails:
(magi) kanghengrui@x86_64-conda-linux-gnu [/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1] git:(main) ✗ ➜ bash example/4.5B/test_sample.sh [20:50:34]
/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/magi_attention/init.py:26: UserWarning: You are using magi_attention without installing it. This may cause some unexpected errors.
warnings.warn(
/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/timm/models/layers/init.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning)
[W1216 20:50:57.975591094 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
[2025-12-16 20:50:57,458 - INFO] Initialize torch distribution and model parallel successfully
[2025-12-16 20:50:57,458 - INFO] MagiConfig(model_config=ModelConfig(model_name='videodit_ardf', num_layers=34, hidden_size=3072, ffn_hidden_size=12288, num_attention_heads=24, num_query_groups=8, kv_channels=128, layernorm_epsilon=1e-06, apply_layernorm_1p=True, x_rescale_factor=1, half_channel_vae=Fa
lse, params_dtype=torch.bfloat16, patch_size=2, t_patch_size=1, in_channels=16, out_channels=16, cond_hidden_ratio=0.25, caption_channels=4096, caption_max_length=800, xattn_cond_hidden_ratio=1.0, cond_gating_ratio=1.0, gated_linear_unit=False), runtime_config=RuntimeConfig(cfg_number=3, cfg_t_range=[0
.0, 0.0217, 0.1, 0.3, 0.999], prev_chunk_scales=[1.5, 1.5, 1.5, 1.0, 1.0], text_scales=[7.5, 7.5, 7.5, 0.0, 0.0], noise2clean_kvrange=[5, 4, 3, 2], clean_chunk_kvrange=1, clean_t=0.9999, seed=1234, num_frames=96, video_size_h=720, video_size_w=720, num_steps=64, window_size=4, fps=24, chunk_width=6, t5
_pretrained='/mnt/shared-storage-gpfs2/gpfs2-shared-public/huggingface/zskj-hub/models--sand-ai--MAGI-1/ckpt/t5', t5_device='cpu', vae_pretrained='/mnt/shared-storage-gpfs2/gpfs2-shared-public/huggingface/zskj-hub/models--sand-ai--MAGI-1/ckpt/vae', scale_factor=0.18215, temporal_downsample_factor=4, lo
ad='/mnt/shared-storage-gpfs2/gpfs2-shared-public/huggingface/zskj-hub/models--sand-ai--MAGI-1/ckpt/magi/4.5B_base'), engine_config=EngineConfig(distributed_backend='nccl', distributed_timeout_minutes=15, pp_size=1, cp_size=1, cp_strategy='cp_ulysses', ulysses_overlap_degree=1, fp8_quant=False, distill
_nearly_clean_chunk_threshold=0.3, shortcut_mode='8,16,16', distill=False, kv_offload=True, enable_cuda_graph=False))
[2025-12-16 20:50:57,458 - INFO] Precompute validation prompt embeddings
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be
set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
Loading checkpoint shards: 100%|██████████| 2/2 [00:15<00:00, 7.84s/it]
[2025-12-16 20:51:22,098 - INFO] VideoDiTModel(
(x_embedder): Conv3d(16, 3072, kernel_size=(1, 2, 2), stride=(1, 2, 2), bias=False)
(t_embedder): TimestepEmbedder(
(mlp): Sequential(
(0): Linear(in_features=256, out_features=768, bias=True)
(1): SiLU()
(2): Linear(in_features=768, out_features=768, bias=True)
)
)
(y_embedder): CaptionEmbedder(
(y_proj_xattn): Sequential(
(0): Linear(in_features=4096, out_features=3072, bias=True)
(1): SiLU()
)
(y_proj_adaln): Sequential(
(0): Linear(in_features=4096, out_features=768, bias=True)
)
)
(rope): LearnableRotaryEmbeddingCat()
(videodit_blocks): TransformerBlock(
(layers): ModuleList(
(0-33): 34 x TransformerLayer(
(ada_modulate_layer): AdaModulateLayer(
(act): SiLU()
(proj): Sequential(
(0): Linear(in_features=768, out_features=6144, bias=True)
)
)
(self_attention): FullyParallelAttention(
(linear_qkv): CustomLayerNormLinear(
(layer_norm): LayerNorm((3072,), eps=1e-06, elementwise_affine=True)
(q): Linear(in_features=3072, out_features=3072, bias=False)
(qx): Linear(in_features=3072, out_features=3072, bias=False)
(k): Linear(in_features=3072, out_features=1024, bias=False)
(v): Linear(in_features=3072, out_features=1024, bias=False)
)
(linear_kv_xattn): Linear(in_features=3072, out_features=2048, bias=False)
(linear_proj): Linear(in_features=6144, out_features=3072, bias=False)
(q_layernorm): FusedLayerNorm()
(q_layernorm_xattn): FusedLayerNorm()
(k_layernorm): FusedLayerNorm()
(k_layernorm_xattn): FusedLayerNorm()
)
(self_attn_post_norm): FusedLayerNorm()
(mlp): CustomMLP(
(layer_norm): LayerNorm((3072,), eps=1e-06, elementwise_affine=True)
(linear_fc1): Linear(in_features=3072, out_features=12288, bias=False)
(linear_fc2): Linear(in_features=12288, out_features=3072, bias=False)
)
(mlp_post_norm): FusedLayerNorm()
)
)
(final_layernorm): FusedLayerNorm()
)
(final_linear): FinalLinear(
(linear): Linear(in_features=3072, out_features=64, bias=False)
)
)
[2025-12-16 20:51:22,101 - INFO] (cp, pp) rank (0, 0): param count 4459898128, model size 8.34 GB
[2025-12-16 20:51:22,101 - INFO] Build DiTModel successfully
[2025-12-16 20:51:22,102 - INFO] After build_dit_model, memory allocated: 0.01 GB, memory reserved: 0.02 GB
[2025-12-16 20:51:22,102 - INFO] load inference_weight weight from /mnt/shared-storage-gpfs2/gpfs2-shared-public/huggingface/zskj-hub/models--sand-ai--MAGI-1/ckpt/magi/4.5B_base/inference_weight
Loading shards: 100%|██████████| 2/2 [00:00<00:00, 7.70it/s]
[2025-12-16 20:51:30,853 - INFO] Load Weight Missing Keys: []
[2025-12-16 20:51:30,854 - INFO] Load Weight Unexpected Keys: []
[2025-12-16 20:51:31,010 - INFO] After load_checkpoint, memory allocated: 8.36 GB, memory reserved: 8.37 GB
[2025-12-16 20:51:31,013 - INFO] After high_precision_promoter, memory allocated: 8.36 GB, memory reserved: 8.37 GB
[2025-12-16 20:51:31,164 - INFO] Load checkpoint successfully
[2025-12-16 20:51:31,164 - INFO] special_token = ['HQ_TOKEN', 'DURATION_TOKEN']
InferBatch 0: 0%| | 0/4 [00:00<?, ?it/s][2025-12-16 20:51:31,198 - INFO] transport_inputs len: 1
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/entry.py", line 54, in
[rank0]: main()
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/entry.py", line 40, in main
[rank0]: pipeline.run_text_to_video(prompt=args.prompt, output_path=args.output_path)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/pipeline.py", line 35, in run_text_to_video
[rank0]: self._run(prompt, None, output_path)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/pipeline.py", line 49, in _run
[rank0]: [
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/pipeline.py", line 49, in
[rank0]: [
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/video_generate.py", line 763, in generate_per_chunk
[rank0]: for _, _, chunk in sample_transport.walk():
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/video_generate.py", line 725, in walk
[rank0]: velocity = self.forward_velocity(infer_idx, 0)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/video_generate.py", line 657, in forward_velocity
[rank0]: velocity = forward_fn(
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_model.py", line 503, in forward_dispatcher
[rank0]: (out_cond_pre_and_text, out_cond_pre, out_uncond, denoise_width) = self.forward_3cfg(
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_model.py", line 414, in forward_3cfg
[rank0]: out_cond_pre_and_text = self.forward(
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_model.py", line 385, in forward
[rank0]: x = self.videodit_blocks.forward(
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_module.py", line 1427, in forward
[rank0]: hidden_states = layer(
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_module.py", line 1308, in forward
[rank0]: core_attn_out, cross_attn_out = self.self_attention(
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_module.py", line 1195, in forward
[rank0]: core_attn_out, xattn_out = UlyssesScheduler.get_attn_and_xattn_with_fused_kv_comm(
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/infra/parallelism/context_parallel.py", line 525, in get_attn_and_xattn_with_fused_kv_comm
[rank0]: return UlyssesScheduler.get_attn_and_xattn_base(
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/infra/parallelism/context_parallel.py", line 585, in get_attn_and_xattn_base
[rank0]: core_attn_out_new = core_attn_func(query[i], key, value)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_module.py", line 1032, in core_attention
[rank0]: core_attn_out, _ = flex_attention(
[rank0]: TypeError: flex_flash_attn_func() got an unexpected keyword argument 'max_seqlen_q'
InferBatch 0: 0%| | 0/4 [00:00<?, ?it/s]
My environment:
cuda=12.4
torch=2.4.0+cu124
python=3.10.12
magi_attention = 0.0.0
flash_attn=2.7.0.post1
I just reduced FFA_JOBS from 160 to 10 and successfully installed magi_attention=0.0.0
I found if I didn't install magi_attention=0.0.0, this error wouldn't exist. Furthermore, at the beginning this log reminds me that:
UserWarning: You are using magi_attention without installing it. This may cause some unexpected errors.
So I doubt that maybe it is related to magi_attention?
Hi, I just come from Magi-1 repo, when I run a sample test but it fails:
(magi) kanghengrui@x86_64-conda-linux-gnu [/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1] git:(main) ✗ ➜ bash example/4.5B/test_sample.sh [20:50:34]
/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/magi_attention/init.py:26: UserWarning: You are using magi_attention without installing it. This may cause some unexpected errors.
warnings.warn(
/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using
TRANSFORMERS_CACHEis deprecated and will be removed in v5 of Transformers. UseHF_HOMEinstead.warnings.warn(
/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/timm/models/layers/init.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning)
[W1216 20:50:57.975591094 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
[2025-12-16 20:50:57,458 - INFO] Initialize torch distribution and model parallel successfully
[2025-12-16 20:50:57,458 - INFO] MagiConfig(model_config=ModelConfig(model_name='videodit_ardf', num_layers=34, hidden_size=3072, ffn_hidden_size=12288, num_attention_heads=24, num_query_groups=8, kv_channels=128, layernorm_epsilon=1e-06, apply_layernorm_1p=True, x_rescale_factor=1, half_channel_vae=Fa
lse, params_dtype=torch.bfloat16, patch_size=2, t_patch_size=1, in_channels=16, out_channels=16, cond_hidden_ratio=0.25, caption_channels=4096, caption_max_length=800, xattn_cond_hidden_ratio=1.0, cond_gating_ratio=1.0, gated_linear_unit=False), runtime_config=RuntimeConfig(cfg_number=3, cfg_t_range=[0
.0, 0.0217, 0.1, 0.3, 0.999], prev_chunk_scales=[1.5, 1.5, 1.5, 1.0, 1.0], text_scales=[7.5, 7.5, 7.5, 0.0, 0.0], noise2clean_kvrange=[5, 4, 3, 2], clean_chunk_kvrange=1, clean_t=0.9999, seed=1234, num_frames=96, video_size_h=720, video_size_w=720, num_steps=64, window_size=4, fps=24, chunk_width=6, t5
_pretrained='/mnt/shared-storage-gpfs2/gpfs2-shared-public/huggingface/zskj-hub/models--sand-ai--MAGI-1/ckpt/t5', t5_device='cpu', vae_pretrained='/mnt/shared-storage-gpfs2/gpfs2-shared-public/huggingface/zskj-hub/models--sand-ai--MAGI-1/ckpt/vae', scale_factor=0.18215, temporal_downsample_factor=4, lo
ad='/mnt/shared-storage-gpfs2/gpfs2-shared-public/huggingface/zskj-hub/models--sand-ai--MAGI-1/ckpt/magi/4.5B_base'), engine_config=EngineConfig(distributed_backend='nccl', distributed_timeout_minutes=15, pp_size=1, cp_size=1, cp_strategy='cp_ulysses', ulysses_overlap_degree=1, fp8_quant=False, distill
_nearly_clean_chunk_threshold=0.3, shortcut_mode='8,16,16', distill=False, kv_offload=True, enable_cuda_graph=False))
[2025-12-16 20:50:57,458 - INFO] Precompute validation prompt embeddings
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the
legacy(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, setlegacy=False. This should only beset if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
Loading checkpoint shards: 100%|██████████| 2/2 [00:15<00:00, 7.84s/it]
[2025-12-16 20:51:22,098 - INFO] VideoDiTModel(
(x_embedder): Conv3d(16, 3072, kernel_size=(1, 2, 2), stride=(1, 2, 2), bias=False)
(t_embedder): TimestepEmbedder(
(mlp): Sequential(
(0): Linear(in_features=256, out_features=768, bias=True)
(1): SiLU()
(2): Linear(in_features=768, out_features=768, bias=True)
)
)
(y_embedder): CaptionEmbedder(
(y_proj_xattn): Sequential(
(0): Linear(in_features=4096, out_features=3072, bias=True)
(1): SiLU()
)
(y_proj_adaln): Sequential(
(0): Linear(in_features=4096, out_features=768, bias=True)
)
)
(rope): LearnableRotaryEmbeddingCat()
(videodit_blocks): TransformerBlock(
(layers): ModuleList(
(0-33): 34 x TransformerLayer(
(ada_modulate_layer): AdaModulateLayer(
(act): SiLU()
(proj): Sequential(
(0): Linear(in_features=768, out_features=6144, bias=True)
)
)
(self_attention): FullyParallelAttention(
(linear_qkv): CustomLayerNormLinear(
(layer_norm): LayerNorm((3072,), eps=1e-06, elementwise_affine=True)
(q): Linear(in_features=3072, out_features=3072, bias=False)
(qx): Linear(in_features=3072, out_features=3072, bias=False)
(k): Linear(in_features=3072, out_features=1024, bias=False)
(v): Linear(in_features=3072, out_features=1024, bias=False)
)
(linear_kv_xattn): Linear(in_features=3072, out_features=2048, bias=False)
(linear_proj): Linear(in_features=6144, out_features=3072, bias=False)
(q_layernorm): FusedLayerNorm()
(q_layernorm_xattn): FusedLayerNorm()
(k_layernorm): FusedLayerNorm()
(k_layernorm_xattn): FusedLayerNorm()
)
(self_attn_post_norm): FusedLayerNorm()
(mlp): CustomMLP(
(layer_norm): LayerNorm((3072,), eps=1e-06, elementwise_affine=True)
(linear_fc1): Linear(in_features=3072, out_features=12288, bias=False)
(linear_fc2): Linear(in_features=12288, out_features=3072, bias=False)
)
(mlp_post_norm): FusedLayerNorm()
)
)
(final_layernorm): FusedLayerNorm()
)
(final_linear): FinalLinear(
(linear): Linear(in_features=3072, out_features=64, bias=False)
)
)
[2025-12-16 20:51:22,101 - INFO] (cp, pp) rank (0, 0): param count 4459898128, model size 8.34 GB
[2025-12-16 20:51:22,101 - INFO] Build DiTModel successfully
[2025-12-16 20:51:22,102 - INFO] After build_dit_model, memory allocated: 0.01 GB, memory reserved: 0.02 GB
[2025-12-16 20:51:22,102 - INFO] load inference_weight weight from /mnt/shared-storage-gpfs2/gpfs2-shared-public/huggingface/zskj-hub/models--sand-ai--MAGI-1/ckpt/magi/4.5B_base/inference_weight
Loading shards: 100%|██████████| 2/2 [00:00<00:00, 7.70it/s]
[2025-12-16 20:51:30,853 - INFO] Load Weight Missing Keys: []
[2025-12-16 20:51:30,854 - INFO] Load Weight Unexpected Keys: []
[2025-12-16 20:51:31,010 - INFO] After load_checkpoint, memory allocated: 8.36 GB, memory reserved: 8.37 GB
[2025-12-16 20:51:31,013 - INFO] After high_precision_promoter, memory allocated: 8.36 GB, memory reserved: 8.37 GB
[2025-12-16 20:51:31,164 - INFO] Load checkpoint successfully
[2025-12-16 20:51:31,164 - INFO] special_token = ['HQ_TOKEN', 'DURATION_TOKEN']
InferBatch 0: 0%| | 0/4 [00:00<?, ?it/s][2025-12-16 20:51:31,198 - INFO] transport_inputs len: 1
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/entry.py", line 54, in
[rank0]: main()
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/entry.py", line 40, in main
[rank0]: pipeline.run_text_to_video(prompt=args.prompt, output_path=args.output_path)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/pipeline.py", line 35, in run_text_to_video
[rank0]: self._run(prompt, None, output_path)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/pipeline.py", line 49, in _run
[rank0]: [
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/pipeline.py", line 49, in
[rank0]: [
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/video_generate.py", line 763, in generate_per_chunk
[rank0]: for _, _, chunk in sample_transport.walk():
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/video_generate.py", line 725, in walk
[rank0]: velocity = self.forward_velocity(infer_idx, 0)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/pipeline/video_generate.py", line 657, in forward_velocity
[rank0]: velocity = forward_fn(
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_model.py", line 503, in forward_dispatcher
[rank0]: (out_cond_pre_and_text, out_cond_pre, out_uncond, denoise_width) = self.forward_3cfg(
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_model.py", line 414, in forward_3cfg
[rank0]: out_cond_pre_and_text = self.forward(
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_model.py", line 385, in forward
[rank0]: x = self.videodit_blocks.forward(
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_module.py", line 1427, in forward
[rank0]: hidden_states = layer(
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_module.py", line 1308, in forward
[rank0]: core_attn_out, cross_attn_out = self.self_attention(
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/kanghengrui/miniconda3/envs/magi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_module.py", line 1195, in forward
[rank0]: core_attn_out, xattn_out = UlyssesScheduler.get_attn_and_xattn_with_fused_kv_comm(
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/infra/parallelism/context_parallel.py", line 525, in get_attn_and_xattn_with_fused_kv_comm
[rank0]: return UlyssesScheduler.get_attn_and_xattn_base(
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/infra/parallelism/context_parallel.py", line 585, in get_attn_and_xattn_base
[rank0]: core_attn_out_new = core_attn_func(query[i], key, value)
[rank0]: File "/mnt/shared-storage-gpfs2/kanghengrui-gpfs02/manivid/vid_models/MAGI-1/inference/model/dit/dit_module.py", line 1032, in core_attention
[rank0]: core_attn_out, _ = flex_attention(
[rank0]: TypeError: flex_flash_attn_func() got an unexpected keyword argument 'max_seqlen_q'
InferBatch 0: 0%| | 0/4 [00:00<?, ?it/s]
My environment:
cuda=12.4
torch=2.4.0+cu124
python=3.10.12
magi_attention = 0.0.0
flash_attn=2.7.0.post1
I just reduced FFA_JOBS from 160 to 10 and successfully installed magi_attention=0.0.0
I found if I didn't install magi_attention=0.0.0, this error wouldn't exist. Furthermore, at the beginning this log reminds me that:
So I doubt that maybe it is related to magi_attention?