Hi, thanks for releasing DM0!
I have a question about downstream fine-tuning from DM0-base to specific embodied tasks such as RobotChallenge. In task-specific embodied fine-tuning, is the LLM/VLM part frozen, fully fine-tuned, or adapted with something like LoRA/PEFT? I was not fully sure from the paper what the default setup is for the language backbone in this stage.
I also noticed that the paper mentions task fine-tuning lengths around 40k–150k steps, and around 200k steps for some mixed-task settings. This feels a bit long to me, so I wanted to ask whether this is the expected convergence range for DM0 in practice, or whether it typically converges faster on downstream tasks.
Thanks!
Hi, thanks for releasing DM0!
I have a question about downstream fine-tuning from DM0-base to specific embodied tasks such as RobotChallenge. In task-specific embodied fine-tuning, is the LLM/VLM part frozen, fully fine-tuned, or adapted with something like LoRA/PEFT? I was not fully sure from the paper what the default setup is for the language backbone in this stage.
I also noticed that the paper mentions task fine-tuning lengths around 40k–150k steps, and around 200k steps for some mixed-task settings. This feels a bit long to me, so I wanted to ask whether this is the expected convergence range for DM0 in practice, or whether it typically converges faster on downstream tasks.
Thanks!