Extracting Probabilistic Knowledge from Large Language Models for Bayesian Network Parameterization

This repository accompanies the paper: “Extracting Probabilistic Knowledge from Large Language Models for Bayesian Network Parameterization” We evaluate the ability of modern LLMs to estimate conditional probability tables for Bayesian Networks and introduce Expert-Driven Priors (EDP): a pseudocount fusion that combines LLM-derived priors with data to improve parameter estimation, especially under data scarcity.

Requirements

Use Python version 3.13 or higher.
Run the following commands to install the required packages:

pip install -r requirements.txt

Set up API keys for the LLMs you want to use. You can do this by creating a .env file in the EstimationofPriorProbabilitiesbyLLMs repository or by setting environment variables directly. The following environment variables are required:
- OPENAI_API_KEY
- DEEPSEEK_API_KEY
- GOOGLE_API_KEY
- ANTHROPIC_API_KEY

Usage

Generate LLM‑derived priors (SepState and FullDist)

From the repository root:

cd EstimationofPriorProbabilitiesbyLLMs
python main.py

This step produces prior distributions (conditional probability tables) estimated by the LLM. You will use these in the next step.
To evaluate the quality of the priors, run the following command:
```
python test.py
python analysis_BNs.py
```
Refer to the README in the EstimationofPriorProbabilitiesbyLLMs folder for more details.

Fuse priors with data using EDP

Then:
```
cd ../EDP
python main.py
```
This combines empirical counts with the LLM‑derived priors to estimate Bayesian Network parameters under the EDP framework.
To evaluate the quality of the priors, run the following command:
```
python test.py
python analysis_BNandData.py
```
Refer to the README in the EDP folder for more details.

Run downstream classification experiments

Finally:
```
cd ../classification
```
Use the scripts in this folder to run classification experiments and evaluate EDP on downstream tasks.
You need additional setup to run the classification experiments. Please refer to the README in the classification folder.

Citation

Please cite our paper if you find this code useful:

@misc{nafar2025extractingprobabilisticknowledgelarge,
      title={Extracting Probabilistic Knowledge from Large Language Models for Bayesian Network Parameterization}, 
      author={Aliakbar Nafar and Kristen Brent Venable and Zijun Cui and Parisa Kordjamshidi},
      year={2025},
      eprint={2505.15918},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.15918}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
EDP		EDP
EstimationofPriorProbabilitiesbyLLMs		EstimationofPriorProbabilitiesbyLLMs
Figures		Figures
PreprocessingBayesianNetworks		PreprocessingBayesianNetworks
classification		classification
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extracting Probabilistic Knowledge from Large Language Models for Bayesian Network Parameterization

Requirements

Usage

Generate LLM‑derived priors (SepState and FullDist)

Fuse priors with data using EDP

Run downstream classification experiments

Citation

About

Uh oh!

Releases

Packages

Languages

HLR/llm-bn-parameterization

Folders and files

Latest commit

History

Repository files navigation

Extracting Probabilistic Knowledge from Large Language Models for Bayesian Network Parameterization

Requirements

Usage

Generate LLM‑derived priors (SepState and FullDist)

Fuse priors with data using EDP

Run downstream classification experiments

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages