Skip to content

Ray Deployment of moonlight 16B (MBridge) fails #616

@ko3n1g

Description

@ko3n1g

Describe the bug

(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96)   File "/opt/Megatron-Bridge/3rdparty/Megatron-LM/megatron/core/transformer/transformer_layer.py", line 609, in _forward_attention [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96)     attention_output_with_bias = self.self_attention( [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96)                                  ^^^^^^^^^^^^^^^^^^^^ [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96)     self.config.cache_mla_latents [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96) AssertionError: currently to use dynamic backend for MLA cache mla latents must be true [repeated 3x across cluster]

Steps/Code to reproduce bug

  1. ToT MBridge/MCore
  2. Moonlight 16B pretrain checkpoint
  3. TP1/PP1/CP1

Expected behavior

Can deploy a ray cluster

Additional context

Add any other context about the problem here.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions