about the location coordination and eval details

We test mimo-embodied with FlagEvalMM frameworks, since mimo-embodied use qwen2.5 as pretrained sources.
It's maybe predict location 2d boxes or points with absolute coords. However, we get 48.98 acc in RoboAfford-Eval (official show 69.81).
Please help us with corrlect prompt or some eval details