Skip to content

Questions and Discussion Regarding Your Paper #2

@Lee-xeo

Description

@Lee-xeo

First and foremost, congratulations on your excellent work and significant contributions to the field of large multimodal models. After a careful reading, we have a few questions we would be grateful to discuss with you.

On the Comparison with the Qwen-3-VL-Think Variant:
We noted that alongside Qwen-3-VL, a "Think" variant was also released. As we understand it, both variants share the same model architecture but differ in their training data and strategies. Given that the Qwen-3-VL-Think variant is also optimized via reinforcement learning and is capable of processing both image and video tasks, its technical setup appears highly relevant to your work. Could you please elaborate on the reason for not including a comparative analysis with the Qwen-3-VL-Think variant in your paper? We believe such a comparison could further highlight the unique advantages of your proposed method.

On Variable Control in the Data Modality Ablation Study:
In your ablation study on data modalities, you demonstrated the impact of each modality by removing its corresponding data, which provides valuable insights. However, we have a question regarding the experimental design: could this approach potentially introduce a confounding variable? Specifically, the observed performance degradation might be attributed not only to the "absence of a modality's capability" but also to the "reduction in the total volume of training data." We were wondering if you had considered alternative experimental setups to more precisely isolate the effects of these two factors—for instance, by replacing the removed data with an equal amount of data from other modalities. This could potentially strengthen the conclusions drawn from the ablation study.

On Comparison with Other State-of-the-Art Reinforcement Learning Methods:
Your paper effectively demonstrates the efficacy of EMA-GRPO through comparisons with methods like GRPO. Concurrently, we have seen the emergence of several new and promising RL algorithms in the community, such as GAPO, GSPO and SPAO (used by qwen3-vl-think). We are curious if you have considered benchmarking your method against these more recent algorithms. We believe that a broader comparative analysis would further solidify the significance and robustness of EMA-GRPO, and we would be very interested to see how it performs.

Thank you again for your time and for your outstanding contribution to the field. We look forward to your response.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions