-
Notifications
You must be signed in to change notification settings - Fork 37
Description
I am testing out LettuceDetect for possible use with RAG. I noticed that the probabilities (and hence class predictions) change considerable as more context is added. We have 10 chunks retrieved, with a total of about 2600 tokens. I am faking an answer like this:
answer = """The Flare check can be temporarily made into an optional check.
Approving artifacts in Flare is no longer needed.
Smurfs are blue. """
The first sentence is copied verbatim from the first chunk. The second is an actual hallucinated answer, while the Smurfs thing is just for control. I'm changing everything to lowercase before sending to the model since the tokenization seems to be case sensitive. If I only include the first chunk, here's what I'm seeing for token predictions:
- the (prediction: 0, probability: 0.00)
- flare (prediction: 0, probability: 0.00)
- check (prediction: 0, probability: 0.00)
- can (prediction: 0, probability: 0.00)
- be (prediction: 0, probability: 0.00)
- temporarily (prediction: 0, probability: 0.00)
- made (prediction: 0, probability: 0.00)
- into (prediction: 0, probability: 0.00)
- an (prediction: 0, probability: 0.00)
- optional (prediction: 0, probability: 0.00)
- check (prediction: 0, probability: 0.00)
- . (prediction: 0, probability: 0.00)
-
(prediction: 0, probability: 0.04)
- (prediction: 1, probability: 0.51)
- appro (prediction: 1, probability: 0.72)
- ving (prediction: 1, probability: 0.76)
- artifacts (prediction: 1, probability: 0.82)
- in (prediction: 1, probability: 0.79)
- flare (prediction: 1, probability: 0.80)
- is (prediction: 1, probability: 0.83)
- no (prediction: 1, probability: 0.90)
- longer (prediction: 1, probability: 0.83)
- needed (prediction: 1, probability: 0.87)
- . (prediction: 1, probability: 0.69)
-
(prediction: 0, probability: 0.10)
- (prediction: 0, probability: 0.46)
- sm (prediction: 1, probability: 0.88)
- ur (prediction: 1, probability: 0.89)
- fs (prediction: 1, probability: 0.92)
- are (prediction: 1, probability: 0.91)
- blue (prediction: 1, probability: 0.92)
- . (prediction: 1, probability: 0.91)
- (prediction: 1, probability: 0.63)
- [SEP] (prediction: 0, probability: 0.00)
For chunks 0:5, it looks like
- the (prediction: 0, probability: 0.00)
- flare (prediction: 0, probability: 0.01)
- check (prediction: 0, probability: 0.00)
- can (prediction: 0, probability: 0.00)
- be (prediction: 0, probability: 0.01)
- temporarily (prediction: 0, probability: 0.00)
- made (prediction: 0, probability: 0.00)
- into (prediction: 0, probability: 0.00)
- an (prediction: 0, probability: 0.00)
- optional (prediction: 0, probability: 0.01)
- check (prediction: 0, probability: 0.01)
- . (prediction: 0, probability: 0.01)
-
(prediction: 0, probability: 0.03)
- (prediction: 0, probability: 0.05)
- appro (prediction: 0, probability: 0.09)
- ving (prediction: 0, probability: 0.08)
- artifacts (prediction: 0, probability: 0.09)
- in (prediction: 0, probability: 0.10)
- flare (prediction: 0, probability: 0.10)
- is (prediction: 0, probability: 0.12)
- no (prediction: 0, probability: 0.16)
- longer (prediction: 0, probability: 0.13)
- needed (prediction: 0, probability: 0.14)
- . (prediction: 0, probability: 0.12)
-
(prediction: 0, probability: 0.04)
- (prediction: 0, probability: 0.34)
- sm (prediction: 1, probability: 0.79)
- ur (prediction: 1, probability: 0.83)
- fs (prediction: 1, probability: 0.86)
- are (prediction: 1, probability: 0.85)
- blue (prediction: 1, probability: 0.82)
- . (prediction: 1, probability: 0.84)
- (prediction: 1, probability: 0.73)
- [SEP] (prediction: 0, probability: 0.00)
And for all 10 chunks I get
- the (prediction: 0, probability: 0.02)
- flare (prediction: 0, probability: 0.02)
- check (prediction: 0, probability: 0.02)
- can (prediction: 0, probability: 0.02)
- be (prediction: 0, probability: 0.03)
- temporarily (prediction: 0, probability: 0.03)
- made (prediction: 0, probability: 0.03)
- into (prediction: 0, probability: 0.03)
- an (prediction: 0, probability: 0.03)
- optional (prediction: 0, probability: 0.05)
- check (prediction: 0, probability: 0.05)
- . (prediction: 0, probability: 0.06)
-
(prediction: 0, probability: 0.02)
- (prediction: 0, probability: 0.05)
- appro (prediction: 0, probability: 0.08)
- ving (prediction: 0, probability: 0.07)
- artifacts (prediction: 0, probability: 0.08)
- in (prediction: 0, probability: 0.07)
- flare (prediction: 0, probability: 0.08)
- is (prediction: 0, probability: 0.11)
- no (prediction: 0, probability: 0.14)
- longer (prediction: 0, probability: 0.11)
- needed (prediction: 0, probability: 0.12)
- . (prediction: 0, probability: 0.10)
-
(prediction: 0, probability: 0.05)
- (prediction: 0, probability: 0.38)
- sm (prediction: 1, probability: 0.85)
- ur (prediction: 1, probability: 0.88)
- fs (prediction: 1, probability: 0.91)
- are (prediction: 1, probability: 0.89)
- blue (prediction: 1, probability: 0.88)
- . (prediction: 1, probability: 0.88)
- (prediction: 1, probability: 0.74)
- [SEP] (prediction: 0, probability: 0.00)
By the time we get the the full set of 10 chunks, the Smurfs stuff is still predicted as a hallucination, but the first and second sentences are coming together in probability. Is this expected? Am I maybe just using the model incorrectly? The use-case might be a bit more subtle, since the tokens in the second sentence definitely appear in the context (though they are in the first chunk as well). But I'm more surprised that the first sentence is increasing in probability despite appearing verbatim in the context in all cases.