Your work is really impressive. May I ask whether your evaluation on OSWorld-G was conducted on the OSWorld-G refined testset? <img width="1010" height="836" alt="Image" src="https://github.com/user-attachments/assets/a26d8b2a-b8e3-4e52-9b99-04fcbbdb0199" />