Skip to content

adwantg/rag-chain-of-logic-coverage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chain-of-Logic Coverage Evaluator (CoL-CE)

Author: gadwant

License: MIT Python 3.12+

This repository contains the official implementation of the Chain-of-Logic Coverage Evaluator (CoL-CE), a framework for assessing reasoning validity in Retrieval-Augmented Generation (RAG) systems.

CoL-CE evaluates whether the logical connections (edges) in a generated answer's reasoning graph are explicitly supported by retrieved context, distinct from checking only atomic facts.

📂 Repository Structure

code/
├── src/                    # Source code
│   ├── run_reverification.py   # Main evaluation script (N=30)
│   ├── run_experiment.py       # Legacy experiment runner
│   ├── data_loader.py          # HotpotQA loader
│   ├── extractor.py            # DeepSeek-R1 logic extractor
│   ├── verifier.py             # Mistral edge verifier
│   └── experiment.py           # Core experiment logic
├── results/                # Experimental Data
│   ├── reverified/         # The final N=30 Strict Results (Paper source).
│   │   ├── reverified_results.json
│   │   └── reverified_stats.json
│   ├── enhanced/           # The raw Llama-3 outputs.
│   └── legacy/             # The pilot study data.
├── PROMPTS.md              # Exact prompt templates used
├── requirements.txt        # Python dependencies
└── README.md               # This file

🛠️ Setup

  1. Prerequisites:

    • Python 3.12+
    • Ollama installed and running locally.
  2. Install Models: Pull the required quantized models via Ollama:

    ollama pull deepseek-r1:latest  # Logic Extractor
    ollama pull mistral:latest      # Edge Verifier
    ollama pull llama3:latest       # Answer Generator
  3. Install Dependencies:

    pip install -r requirements.txt

🚀 Usage

1. Run Re-Verification (Recommended)

To reproduce the paper's final N=30 "Re-Verification" results (Strict Protocol + Factual Coverage Baseline):

python3 src/run_reverification.py
  • Input: Loads pre-generated answers from results/enhanced/.
  • Output: Saves strict verification results to results/reverified/.

2. Run Full Experiment (Generation -> Extraction -> Verification)

To start from scratch (generate new answers):

python3 src/run_experiment.py

📊 Results Summary

Metric Result (N=30) Interpretation
Edge Verification Rate (EVR) 60.0% Only 60% of logic is supported.
Logic Coverage Score (LCS) 59.2% ~40% of reasoning steps are hallucinations.
Factual Coverage (FC) 90.9% Baseline. Atomic facts are mostly correct.

See results/reverified/reverified_stats.json for detailed statistics.

📄 Prompts

See PROMPTS.md for the exact prompt templates used for generation, logic extraction, and strictly verification.

📜 License

MIT License. See LICENSE file for details.

About

Official implementation of CoL-CE: A framework for evaluating reasoning validity in RAG systems

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages