Snakemake workflow: `popgen-npe`

A Snakemake workflow for neural posterior estimation in population genetics. It orchestrates end-to-end simulation, feature extraction, neural posterior estimation (“training”), and windowed inference on VCFs (“prediction”).

Documentation

The documentation covers simulator/processor APIs, configuration options, and usage walkthroughs: https://popgen-npe.readthedocs.io/en/latest/. Building it locally requires Sphinx (see docs/).

Workflows at a glance

workflow/training_workflow.smk: runs the full neural posterior estimation pipeline (simulate → process → train embedding + normalizing flow).
workflow/prediction_workflow.smk: infers trees from a VCF, processes them, and applies the trained posterior estimator window-by-window.

Environment setup

Create and activate a Conda environment that includes Snakemake and the workflow dependencies:

conda env create -f environment.yaml

conda env create -f environment.yaml
conda activate popgen-npe

Training workflow

Copy or adapt one of the templates in workflow/config/ (e.g. AraTha_2epoch_cnn.yaml). This YAML encapsulates:
- Simulation settings (simulator, n_train, n_val, n_test, n_chunk)
- Feature extraction (processor)
- Embedding + posterior estimator hyperparameters (embedding_network, optimizer, etc.)
Launch the workflow:

snakemake --cores 8 \
          --configfile workflow/config/AraTha_2epoch_cnn.yaml \
          --snakefile workflow/training_workflow.smk

The run directory defaults to <project_dir>/<sim>-<processor>-<embedding>-<seed>-<n_train>-e2e unless you enable separate embedding pretraining. Refer to the Usage docs for a field-by-field description of the config file.

Example data for prediction

You can generate example inputs (VCF + ancillary files) compatible with the AraTha config using:

python resources/util/simulate-vcf.py \
  --outpath example_data/AraTha_2epoch \
  --window-size 1000000 \
  --configfile workflow/config/AraTha_2epoch_cnn.yaml

This script simulates data, writes the VCF, ancestral FASTA, population map, and a BED file of windows. These paths correspond to the defaults under the prediction block of workflow/config/AraTha_2epoch_cnn.yaml.

Prediction workflow

To apply a trained model to a VCF, ensure your config file contains a prediction block with:

vcf: gzipped VCF path
ancestral_fasta: ancestral sequence for the same contigs (optional; reference allele is assumed if omitted)
population_map: YAML mapping VCF sample IDs to simulator population names (sample counts must match the simulator defaults)
windows: BED file describing genomic windows
min_snps_per_window: minimum segregating variants per window
n_chunk: number of scatter/gather jobs for prediction

Then run:

snakemake --cores 8 \
          --configfile workflow/config/AraTha_2epoch_cnn.yaml \
          --snakefile workflow/prediction_workflow.smk

Predictions, inferred trees, and QC plots are written to <project_dir>/<vcf_basename>/.

Cluster usage quickstart

We ship an example Snakemake profile in example_profile/ targeting a SLURM cluster (tested on UO’s kerngpu partition). To use it:

snakemake --executor slurm \
          --workflow-profile ~/.config/snakemake/yourprofile \
          --configfile workflow/config/AraTha_2epoch_cnn.yaml \
          --snakefile workflow/training_workflow.smk

Notes:

The profile assumes Snakemake 8.x’s executor interface. Adjust example_profile/config.yaml if you are pinned to an older version.
The workflow YAML cpu_resources/gpu_resources blocks drive per-rule SLURM resource requests (runtime/memory/GPU count, partitions, constraints, etc.).

Name		Name	Last commit message	Last commit date
Latest commit History 426 Commits
depr		depr
docs		docs
example_profile		example_profile
experiments		experiments
resources/util		resources/util
tests		tests
workflow		workflow
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Snakemake workflow: `popgen-npe`

Documentation

Workflows at a glance

Environment setup

Training workflow

Example data for prediction

Prediction workflow

Cluster usage quickstart

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

kr-colab/popgen-npe

Folders and files

Latest commit

History

Repository files navigation

Snakemake workflow: popgen-npe

Documentation

Workflows at a glance

Environment setup

Training workflow

Example data for prediction

Prediction workflow

Cluster usage quickstart

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Snakemake workflow: `popgen-npe`

Packages