If you use this code in your research, please cite the following publication: https://arxiv.org/abs/2108.12510
@article{gowda2021pulling,
title={Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing},
author={Sindhu C.M. Gowda and Shalmali Joshi and Haoran Zhang and Marzyeh Ghassemi},
journal={arXiv preprint arXiv:2108.12510},
year={2021}
}
Run the following commands to clone this repo and create the Conda environment:
git clone git@github.com:MLforHealth/CausalDA.git
cd CausalDA/
conda env create -f environment.yml
conda activate causalda
See DataSources.md for detailed instructions to setup the WILDS and CXR datasets. This is not necessary for the synthetic experiments.
To train a single model, e.g.
python train_synthetic.py \
--type par_back_front \
--corr-coff 0.75 \
--test-corr 0.75 \
--output_dir /path/to/output
or
python train.py \
--type back \
--data camelyon \
--data_type Conf \
--domains 2 3 \
--corr-coff 0.95 \
--seed 0 \
--output_dir /path/to/output
To reproduce the experiments in the paper by training grids of models, call sweep.py using the class names defined in experiments.py as experiment names, e.g.
python sweep.py launch \
--experiment CXR \
--output_dir /my/sweep/output/path \
--command_launcher "local"
This command can also be ran easily using launch_scripts/launch_exp.sh. You will likely need to update the launcher to fit your compute environment.
We provide sample code for creating aggregate results for an experiment in AggResults.ipynb.
We make use of code from the WILDS benchmark as well as from the DomainBed framework.
This source code is released under the MIT license, included here.