Skip to content

Ray Deployment of moonlight 16B (MBridge) fails #616

@ko3n1g

Description

@ko3n1g

Describe the bug

(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96)   File "/opt/Megatron-Bridge/3rdparty/Megatron-LM/megatron/core/transformer/transformer_layer.py", line 609, in _forward_attention [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96)     attention_output_with_bias = self.self_attention( [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96)                                  ^^^^^^^^^^^^^^^^^^^^ [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96)     self.config.cache_mla_latents [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96) AssertionError: currently to use dynamic backend for MLA cache mla latents must be true [repeated 3x across cluster]

Steps/Code to reproduce bug

  1. ToT MBridge/MCore
  2. Moonlight 16B pretrain checkpoint
  3. TP1/PP1/CP1

Expected behavior

Can deploy a ray cluster

Additional context

Add any other context about the problem here.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions