This is the code repository of MargNet. The necessary code for the paper is all included in this repository.
We construct a new neural network-based method for DP tabular data synthesis. The detailed code of MargNet is in fold method/MargDL. This method applies an adaptive marginal selection framework and fits the selected marginals with neural networks.
The code for running experiments is in main.py. The detailed description of the necessary hyper-parameters are give as follows.
-
method: which synthesis method you will run.marggancorresponds to our method MargNet -
dataset: name of dataset. -
device: the device used for running algorithms. -
epsilon: DP parameter, which must be delivered when running code. -
--delta: DP parameter, which is set to$1e-5$ by default. -
--resample: whether model use a fixed input or resampled input -
--graph_sample: correspond to a hybrid method, which utilizes junction tree structure to generate data from deep learning model.
The necessary packages for the environment are listed in file requirement.txt. Firstly, make sure the datasets are put in the correct fold (we provide the Adult dataset in our repo for authors to test the code). In addition, please remember to fit the evaluation model before synthesize data.
python evaluator/tune_eval_model.py adult catboost cv cuda:0
After you activate your enviroment, try the following code to make an evaluation.
python main.py marggan adult cuda:0 1.0
The code for ablation can be executed by ablation.py. For example, if you want to fit MargNet and AIM with a same marginals and compare their fitting ability, you can run the following code
python ablation.py marggan adult cuda:0 10.0 --marg_num 10
python ablation.py aim adult cuda:0 10.0 --marg_num 10
The code for evaluation is in file evaluator/eval_seeds.py. By default, we generate data 5 times and conduct evaluation each time we generate the data. The results are the average of all evaluations. All the results are collected in JSON format and saved in the fold exp/{name of dataset}/{name of method}, which can be used for further analysis.
We choose many baselines in our paper for evaluation. Part of the this code is from AIM, DP-MERF, GEM, PrivSyn, RAP++, TabDDPM, CTGAN. We sincerely thank them for their contribution to the community.