-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96) File "/opt/Megatron-Bridge/3rdparty/Megatron-LM/megatron/core/transformer/transformer_layer.py", line 609, in _forward_attention [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96) attention_output_with_bias = self.self_attention( [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96) ^^^^^^^^^^^^^^^^^^^^ [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96) self.config.cache_mla_latents [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96) AssertionError: currently to use dynamic backend for MLA cache mla latents must be true [repeated 3x across cluster]
Steps/Code to reproduce bug
- ToT MBridge/MCore
- Moonlight 16B pretrain checkpoint
- TP1/PP1/CP1
Expected behavior
Can deploy a ray cluster
Additional context
Add any other context about the problem here.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working