Evaluates LLM outputs by extracting atomic facts, identifying key entities, and calculating the percentage of facts supported by the provided database.
Compared to the original FActScore, this enhanced pipeline quantifies LLM truthfulness by:
- Extracting atomic facts from generations;
- Retrieving supporting entities (NER);
- Verifying facts with a provided knowledge source;
This version also provides significant improvements:
- Boosted performance through asynchronous API queries;
- Boosted accuracy via Named Entity Recognition (NER) integration;
- More reliable document retrieval using a sharded FAISS vector index that matches titles by semantic similarity rather than character-level comparison;
- Automatic topic extraction;
- Knowledge source. A reference database in the specified format Ensure the table has two columns: title, text. You can use pre-built .db Wikipedia 2023/04/01 dump, download it directly from here.
- Embeddings. Vector representations of knowledge source titles (article titles). Pre-computed embeddings from the Wikipedia 2023/04/01 dump, generated using the
sentence-transformers/all-mpnet-base-v2model, are available here. - Trained FAISS Index. A trained FAISS IVF Index using the embeddings above. This must be trained on the same embeddings to ensure compatibility and optimal retrieval performance. If the trained index is too large (>5GB), it may not fit in RAM. See factscore/create_index.py about handling this
- API Configuration. As this implementation uses model APIs, you must set base URLs and API keys in their corresponding environment variables before execution.
export EMBEDDINGS_API_KEY="key-for-embeddings"
export COMPLETIONS_API_KEY="key-for-completions"
export EMBEDDINGS_BASE_URL="https://embeddings-api.url"
export COMPLETIONS_BASE_URL="https://completions-api.url"Make a new Python 3.11+ environment conda
- Install the requirements
cd v-factscore
pip install -r requirements.txt- Initialize the factscore instance
from factscore.factscorer import FactScorer
fs = FactScorer()- Use the knowledge source database:
fs.register_knowledge_source(faiss_index="path/to/index",
data_db="path/to/database",
table_name="tablename")- Score generations
res = fs.get_score(generations=[generation1, generation2], k=1)See see demo.ipynb for more details.
This project is licensed under the MIT License — see the LICENSE file for details.