Hi, thank you for your great work.
I would like to ask for clarification on a few technical points:
1. OmniScore Evaluation Setup
-
Did you calculate OmniScore by directly calling the official VBench API, or did you modify the VBench code/configuration? Specifically, for dimensions other than overall_consistency, was evaluation performed using the command vbench evaluate .. --mode custom_input?
-
For overall_consistency, did you rely on the official code released by VBench?
2. Section 4.3. OmniScore-Based Data Re-Weighting
-
(2-1) The paper states that higher weights are assigned to preference pairs with clearer distinctions. However, Eq. (3) seems to suggest that pairs with lower product of frequency probabilities are given higher weights. How exactly does the frequency-probability product relate to “clearer distinctions” between chosen and rejected samples?
-
(2-2) Regarding the parameter β in Eq. (3): the paper says “β is a constant set to the approximate probability of the most frequent sample.” Could you clarify whether this refers to the frequency probability of the most frequent sample or its OmniScore value? From your code, it seems β is set to 0.72, but in Fig. 3(b) the highest frequency probability does not appear to be 0.72. How was the exact β value determined?
I would be very grateful if you could clarify these points. Thank you for your time and for sharing this excellent work with the community.
Best regards,
Junghye Kim
Hi, thank you for your great work.
I would like to ask for clarification on a few technical points:
1. OmniScore Evaluation Setup
Did you calculate OmniScore by directly calling the official VBench API, or did you modify the VBench code/configuration? Specifically, for dimensions other than overall_consistency, was evaluation performed using the command
vbench evaluate .. --mode custom_input?For overall_consistency, did you rely on the official code released by VBench?
2. Section 4.3. OmniScore-Based Data Re-Weighting
(2-1) The paper states that higher weights are assigned to preference pairs with clearer distinctions. However, Eq. (3) seems to suggest that pairs with lower product of frequency probabilities are given higher weights. How exactly does the frequency-probability product relate to “clearer distinctions” between chosen and rejected samples?
(2-2) Regarding the parameter β in Eq. (3): the paper says “β is a constant set to the approximate probability of the most frequent sample.” Could you clarify whether this refers to the frequency probability of the most frequent sample or its OmniScore value? From your code, it seems β is set to 0.72, but in Fig. 3(b) the highest frequency probability does not appear to be 0.72. How was the exact β value determined?
I would be very grateful if you could clarify these points. Thank you for your time and for sharing this excellent work with the community.
Best regards,
Junghye Kim