Scores and probability calcuations

https://github.com/uclaml/SPPO/blob/e524519cc87e9e48cd4da30588f7aa566638df4c/scripts/compute_prob.py#L39

From my understanding of the code, the score list here is the output from the `blender.rank(*, return_scores=True)` which should output the average relative score of the response in the index being better than other responses. Please correct me if wrong. 

For example, given three responses, {y1, y2, y3}, the first element of the scores output by the blender model (s1, s2, 3) is, s1 = P(y1 > y2) + P(y1 > y3), disregarding the constant coefficient and P is general preference score function, not probability. [references from [blender code](https://github.com/yuchenlin/LLM-Blender/blob/2a67366122808f03075e3a9c84f0791dcde0a247/llm_blender/gpt_eval/utils.py#L40) and their paper]

Thus, subtracting two scores, i.e., s1 - s2, is also dependent on the third response y3 as well, which seems a bit different from what is described in the paper.

In summary, I feel it is more appropriate to use the score output from the blender with just two responses (although, I don't think this would make a significant difference in the performance), e.g.,
```
score = blender.rank([x], [[yj, yi]], return_scores=True)[0, 0]
prb[i][j] = 1 / (1 + np.exp(score))
```
(sorry for the badly coded example)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scores and probability calcuations #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Scores and probability calcuations #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions