Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing

Paper

If you use this code in your research, please cite the following publication: https://arxiv.org/abs/2108.12510

@article{gowda2021pulling,
  title={Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing},
  author={Sindhu C.M. Gowda and Shalmali Joshi and Haoran Zhang and Marzyeh Ghassemi},
  journal={arXiv preprint arXiv:2108.12510},
  year={2021}
}

To replicate the experiments in the paper:

Step 0: Environment and Prerequisites

Run the following commands to clone this repo and create the Conda environment:

git clone git@github.com:MLforHealth/CausalDA.git
cd CausalDA/
conda env create -f environment.yml
conda activate causalda

Step 1: Obtaining the Data

See DataSources.md for detailed instructions to setup the WILDS and CXR datasets. This is not necessary for the synthetic experiments.

Step 2: Running Experiments

To train a single model, e.g.

python train_synthetic.py \
    --type par_back_front \
    --corr-coff 0.75 \
    --test-corr 0.75 \
    --output_dir /path/to/output

or

python train.py \
    --type back \
    --data camelyon \
    --data_type Conf \
    --domains 2 3 \
    --corr-coff 0.95 \
    --seed 0 \
    --output_dir /path/to/output

To reproduce the experiments in the paper by training grids of models, call sweep.py using the class names defined in experiments.py as experiment names, e.g.

python sweep.py launch \
    --experiment CXR \
    --output_dir /my/sweep/output/path \
    --command_launcher "local"

This command can also be ran easily using launch_scripts/launch_exp.sh. You will likely need to update the launcher to fit your compute environment.

Step 3: Aggregating Results

We provide sample code for creating aggregate results for an experiment in AggResults.ipynb.

Acknowledgements

We make use of code from the WILDS benchmark as well as from the DomainBed framework.

License

This source code is released under the MIT license, included here.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
bootstrap		bootstrap
data		data
launch_scripts		launch_scripts
model		model
src		src
utils		utils
.gitignore		.gitignore
AggResults.ipynb		AggResults.ipynb
Constants.py		Constants.py
DataSources.md		DataSources.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
experiments.py		experiments.py
launchers.py		launchers.py
sweep.py		sweep.py
train.py		train.py
train_env.py		train_env.py
train_synthetic.py		train_synthetic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing

Paper

To replicate the experiments in the paper:

Step 0: Environment and Prerequisites

Step 1: Obtaining the Data

Step 2: Running Experiments

Step 3: Aggregating Results

Acknowledgements

License

About

Uh oh!

Releases

Packages

Languages

License

MLforHealth/CausalDA

Folders and files

Latest commit

History

Repository files navigation

Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing

Paper

To replicate the experiments in the paper:

Step 0: Environment and Prerequisites

Step 1: Obtaining the Data

Step 2: Running Experiments

Step 3: Aggregating Results

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages