This repository accompanies the paper: “Extracting Probabilistic Knowledge from Large Language Models for Bayesian Network Parameterization” We evaluate the ability of modern LLMs to estimate conditional probability tables for Bayesian Networks and introduce Expert-Driven Priors (EDP): a pseudocount fusion that combines LLM-derived priors with data to improve parameter estimation, especially under data scarcity.
- Use Python version 3.13 or higher.
- Run the following commands to install the required packages:
pip install -r requirements.txt- Set up API keys for the LLMs you want to use. You can do this by creating a .env file in the
EstimationofPriorProbabilitiesbyLLMsrepository or by setting environment variables directly. The following environment variables are required:- OPENAI_API_KEY
- DEEPSEEK_API_KEY
- GOOGLE_API_KEY
- ANTHROPIC_API_KEY
- From the repository root:
cd EstimationofPriorProbabilitiesbyLLMs python main.py - This step produces prior distributions (conditional probability tables) estimated by the LLM. You will use these in the next step.
- To evaluate the quality of the priors, run the following command:
python test.py python analysis_BNs.py
- Refer to the README in the
EstimationofPriorProbabilitiesbyLLMsfolder for more details.
- Then:
cd ../EDP python main.py - This combines empirical counts with the LLM‑derived priors to estimate Bayesian Network parameters under the EDP framework.
- To evaluate the quality of the priors, run the following command:
python test.py python analysis_BNandData.py
- Refer to the README in the
EDPfolder for more details.
- Finally:
cd ../classification - Use the scripts in this folder to run classification experiments and evaluate EDP on downstream tasks.
- You need additional setup to run the classification experiments. Please refer to the README in the classification folder.
Please cite our paper if you find this code useful:
@misc{nafar2025extractingprobabilisticknowledgelarge,
title={Extracting Probabilistic Knowledge from Large Language Models for Bayesian Network Parameterization},
author={Aliakbar Nafar and Kristen Brent Venable and Zijun Cui and Parisa Kordjamshidi},
year={2025},
eprint={2505.15918},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.15918},
}
