DAGAF: A directed acyclic generative adversarial framework for joint structure learning and tabular data synthesis
Authors: Hristo Petkov, Calum MacLellan and Feng Dong
Paper: DAGAF: A directed acyclic generative adversarial framework for joint structure learning and tabular data synthesis (Applied Intelligence, 31 March, 2025)
DAGAF is a generative framework for simultaneous causal discovery and tabular data generation.
- Provides a unified framework for causal structure learning and realistic tabular data generation with preserved causality.
- Integrates ANM, LiNGAM, and PNL models through a multi-objective loss to enable robust causal structure learning under diverse data assumptions.
- DAGAF uses a two-step iterative approach that combines causal knowledge acquisition with high-quality data generation. The causal relationships identified in the first step are transferred and leveraged in the second step to facilitate causal-based tabular data generation.
- Validated on synthetic, benchmark, and real-world datasets, DAGAF significantly outperforms state-of-the-art methods in DAG learning while enabling high-quality, realistic data generation.
The primary audience for hands-on use of DAGAF are researchers and sophisticated practitioners in Causal Structure Learning, Probabilistic Machine Learning and AI. It is recommended to use the framework as a sequence of steps towards achieving a more accurate approximation of the generative process of data. In other words, users should focus on utilizaing the framework for their own novel research, which may include the following: 1) exploration of different Generative Models; 2) application of different Structural Causal Models; 3) integration of different data modes (e.g. time-series data, mixed data, image, video or sound data) and 4) experimentation with various architectures and hyper-parameters. We hope this framework will bridge the gap between the current state of the causal structure learning field and future contributions.
The DAGAF framework has already been applied within a healthcare context, where it was used to emulate the LEAD-5 clinical trials in diabetes studies. For people interested in the outcome of this research project, please read the following publication: Calum Robert MacLellan, Hristo Petkov, Conor McKeag, Feng Dong, David John Lowe, Roma Maguire, Sotiris Moschoyiannis, Jo Armes, Simon Skene, Alastair Finlinson and Christopher Sainsbury: Emulating real-world GLP-1 efficacy in type 2 diabetes through causal learning and virtual patients
We present a novel framework capable of modeling causality resembling the underlying causal mechanisms of the input data and employing them to synthesize diverse, high-fidelity data samples. DAGAF learns multivariate causal structures by applying various functional causal models and determines through experimentation which one best describes the causality in a tabular dataset. Specifically, the framework supports the Post-Nonlinear (PNL) model along with its subsets, which include Linear non-Gaussian Acyclic Models (LiNGAM) and Additive Noise Models (ANM). Unlike other methods that assume data generation is limited to a single causal model, DAGAF satisfies multiple semi-parametric assumptions.
Furthermore, supporting such a broad spectrum of identifiable models enables us to extensively compare our approach against the state-of-the-art in the field. We complete our study by investigating the quality of the discovered causality from a tabular data generation standpoint. We hypothesize that a precise approximation of the original causal mechanisms in a given probability distribution can be leveraged to produce realistic data samples. To prove our hypothesis, DAGAF incorporates an adversarial tabular data synthesis step, based on transfer learning, into our causal discovery framework.
For a more detailed theoretical and technical analysis, please read our paper: H. Petkov, C. MacLellan and F. Dong. DAGAF: A directed acyclic generative adversarial framework for joint structure learning and tabular data synthesis. Applied Intelligence, Springer, 31 March 2025.
We provide users with helpful visualizations (TLDR version of our paper) of the main features of our framework, which include the following: 1) a diagram of our entire framework with different steps included and 2) transition between different forms (basic form -> only working with ANM and LiNGAM; extended form -> working with PNL)
You can clearly see our pipeline is divided into three seperate steps. First, we perform causal discovery using a Structural Causal Model (SCM) to obtain a Directed Acyclic Graph (DAG) containing the underlying structure of the data. Second, we transfer the graph into to our Deep Generative Model (DGM). Third, we use the DGM with the causality obtained from Step 1 to simulate the generative mechanism of the input data, resulting in the generation of high-fidelity, realistic synthetic data.
A Visual Representation of DAGAF. (a) The optimization structure under ANM and LiNGAM, where input data is processed to reconstruct
The easiest way to gain access to our work is to clone the github repo using the following:
git clone https://github.com/ItsyPetkov/DAGAF.git
cd DAGAFTo run our code, users must first create a conda environment using our environment_setup.yml file.
To achieve this just run the following:
conda env create -f environment_setup.yml
conda activate dagaf_env
After your environment is configured and activated you are good to go.
Here are some basic examples to get you started:
To get started with DAGAF just execute the following:
python Main.py
python Main.py -h # This line will give you all of the arguments of the modelThis will execute the default state of our framework, where all of its parameters have been set in the Main.py file
That being said, here are some interesting things you can do:
To run DAGAF with different (SCM) toogle the flags for PNL and LINGAM for 0 to 1. You CANNOT set both flags to 1 at the same time.
python Main.py --pnl 1 # Changes the SCM from ANM (default) to PNL
python Main.py --lingam 1 # Changes the SCM from ANM (default) to LiNGAMTo run DAGAF with different pipeline configurations, you can change the option in the SETTINGS flag
python Main.py --settings EP # Executes the entire pipeline, meaning all steps
python Main.py --settings CSL # Executes only the Causal Structure Learning part
python Main.py --settings DG --load_directory ./ # Executes only the Data Generation part.
# Last one might crash if you overwrite the state_dict of your model incorrectly To run DAGAF with the same data, instead of generating new data everytime (default state), you can set the SYNTHESIZE flag to 0
python Main.py --synthesize 0 # Model will not generate new data and will run with the last data generated
python Main.py # Model will generate new data because synthesize is set 1 by defaultTo run DAGAF with data of different dimensions, change the values of DATA_SAMPLE_SIZE (number of rows, default:5000) and DATA_VARIABLE_SIZE (number of columns, default:10)
python Main.py --data_sample_size 2500 --data_variable_size 50
python Main.py --data_sample_size 4000 --data_variable_size 20To run DAGAF with different types of continuous data, change the value of GRAPH_LINEAR_TYPE (default: non_linear_2). Below is a list of all possibilities
python Main.py --graph_linear_type linear
python Main.py --graph_linear_type nonlinear_1
python Main.py --graph_linear_type nonlinear_2
python Main.py --graph_linear_type post_nonlinear_1
python Main.py --graph_linear_type post_nonlinear_2To run DAGAF with benchmark, discrete data instead of continuous, change DATA_TYPE to benchmark
python Main.py --data_type benchmark --path ./ --save_directory ./ --load_directory ./
# Benchmarks are provided in the data folder.If you wish to use our framework, please cite the following paper:
@article{Petkov2025DAGAFAD,
title={DAGAF: A directed acyclic generative adversarial framework for joint structure learning and tabular data synthesis},
author={Hristo Petkov and Calum MacLellan and Feng Dong},
journal={Applied Intelligence (Dordrecht, Netherlands)},
year={2025},
volume={55},
url = {https://link.springer.com/article/10.1007/s10489-025-06410-8}
}
DAGAF is MIT licensed, as found in the LICENSE file.

