-
Notifications
You must be signed in to change notification settings - Fork 47
Description
Line 39 in e524519
| prb[i][j] = 1 / (1 + np.exp(score[j] - score[i])) |
From my understanding of the code, the score list here is the output from the blender.rank(*, return_scores=True) which should output the average relative score of the response in the index being better than other responses. Please correct me if wrong.
For example, given three responses, {y1, y2, y3}, the first element of the scores output by the blender model (s1, s2, 3) is, s1 = P(y1 > y2) + P(y1 > y3), disregarding the constant coefficient and P is general preference score function, not probability. [references from blender code and their paper]
Thus, subtracting two scores, i.e., s1 - s2, is also dependent on the third response y3 as well, which seems a bit different from what is described in the paper.
In summary, I feel it is more appropriate to use the score output from the blender with just two responses (although, I don't think this would make a significant difference in the performance), e.g.,
score = blender.rank([x], [[yj, yi]], return_scores=True)[0, 0]
prb[i][j] = 1 / (1 + np.exp(score))
(sorry for the badly coded example)