World knowledge versus RAG knowledge

The main purpose for RAG, as I understand it, is the following use case:

You have a knowledge base containing several documents.
These documents are seen as the ground truth. There is no other ground truth.
You simply cannot trust the world knowledge that was fed in to an LLM during pretraining.
We want the LLM to respond in a trustful way. That cannot be accomplished by trusting the knowledge that already resides in the LLM.

Just one example: "Every" website states that cookies are text files.
This is completely wrong. Almost any LLM is fed with this wrong knowledge during pretraining. 
Cookies have never been text files. In former days, popular browsers stored them arbitrarily as text files. But who knows if every browser did it that way? Nowadays, cookies are stored within databases which are not text files or do not need to be text files.

HotpotQA as well as many other tests utilize Wikipedia and thus the same dataset which in most cases had been used for pretraining the very same LLM that should be tested against HotpotQA questions.

So trying to detect hallucinations does not make much sense if you refer to the same knowledge that resides in the LLM, that is asked, as well as in the RAG documents you feed in to that LLM.

A solution could be to rephrase key words in questions that are feed into DRAGIN.
Just before doing the RAG step/index search (elastic search or SGPT) reverse that rephrasing (e.g. use the original question). For the fetched RAG documents, do the rephrasing again before analyzing/using them in the DRAGIN loop. 

So the solution would be to eliminate as muss "knowledge" as possible from the LLM by using different terms when asking the LLM. The "different terms" are ideally not included in the documents, that the LLM was pretrained with.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

World knowledge versus RAG knowledge #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

World knowledge versus RAG knowledge #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions