A code template for building AI-based apps that fact-check statements against a given knowledge base.
EvidenceSeeker Boilerplate is a Python package that provides a fact-checking pipeline with the following steps:
- Statement Analysis: The preprocessor identifies different interpretations of an input statement and categorises them as descriptive, normative, or ascriptive.
- Evidence Retrieval: The retriever searches through your knowledge base for relevant supporting/contradicting evidence.
- Confirmation Analysis: The confimation analyser assesses how well the found evidence supports or refutes claims and aggrated its results by providing confirmation levels for each found interpretation.
- Multiple AI Backends: Support for different inference APIs and local models via LlamaIndex
- Vector Search: Semantic search through documents using state-of-the-art embeddings
- Flexible Configuration: YAML-based configuration for all pipeline components
- CLI Tool: Complete command-line interface (
evse) for project initialization and pipeline execution - Demo Web App: Ready-to-deploy Gradio app with multilingual support (German/English)
- Programmatic API: Import and use EvidenceSeeker directly in your Python projects
- Document Indexing: Build searchable vector indexes from your document collections
- Metadata Support: Rich metadata handling for document attribution and source tracking
- Hub Integration: Upload/download indexes to/from Hugging Face Hub
There are several ways to set up and run an EvidenceSeeker based on our Boilerplate. For details, see the official documentation.
pip install evidence-seekerevse init --name my-fact-checker
cd my-fact-checkerSee https://debatelab.github.io/evidence-seeker/configuration.html.
# Add your documents to knowledge_base/data_files/
evse build-indexevse run -i "Your statement to fact-check"evse demo-app- Core Library: Complete fact-checking pipeline with AI-powered analysis
- CLI Tool: Command-line interface for all operations
- Web Demo: Gradio-based web application with authentication and result persistence
- Configuration Templates: Pre-configured YAML files for immediate use
- Documentation: Comprehensive guides and API documentation
- Example Data: Sample knowledge base and configurations
- LlamaIndex: Workflow orchestration and document processing
- Gradio: Interactive web interface
- Pydantic: Data validation and configuration management
- Sentence Transformers: Document embeddings
- Hugging Face: Model hosting and deployment
- Academic Research: Fact-check claims against scientific literature
- Journalism: Verify statements against reliable source databases
- Policy Analysis: Check policy claims against government documents
- Corporate Compliance: Validate statements against internal documentation
- Educational Tools: Create fact-checking exercises with custom knowledge bases
The EvidenceSeeker Pipeline is based on Large Language Models (LLMs) and proceeds as follows when fact-checking a statement against a knowledge base:
- In a first step, the evidence seeker identifies different interpretations of an input statement and distinguishes between descriptive, ascriptive, and normative statements.
- For each of the found descriptive and ascriptive interpretations, the evidence seeker searches for relevant text passages in a given knowledge base and analyses the extent to which each text passage confirms or refutes the interpretation.
- These individual analyses are aggregated into one of the following confirmation levels for each interpretation :
- ‘highly confirmed’,
- ‘confirmed’,
- ‘weakly confirmed’,
- ‘neither confirmed nor refuted’,
- ‘weakly refuted’,
- ‘refuted’, and
- ‘highly refuted’.
You can find more information about the pipeline here.
- Current demo uses German political science texts as knowledge base
- API timeouts may occur on resource-constrained deployments
- Large knowledge bases may require significant computational resources
- 📖 Documentation: https://debatelab.github.io/evidence-seeker
- 🤗 Demo App: https://huggingface.co/spaces/DebateLabKIT/evidence-seeker-demo
- 📊 Example Results: https://debatelab.github.io/evidence-seeker-results/
- 🔬 KIdeKu Project: https://compphil2mmae.github.io/research/kideku/
We presented the project at the Politechathon Workshop in December 2024 and received constructive feedback.
KIdeKu is funded by the Federal Ministry of Education, Family Affairs, Senior Citizens, Women and Youth (BMBFSFJ).
EvidenceSeeker Boilerplate is licensed under the MIT License.
