MutationScan

MutationScan is a Snakemake-orchestrated AMR analytics pipeline that transforms local bacterial genome assemblies into:

Mutation call reports
Biochemical epistasis network rankings
Optional structure-guided docking deltas (WT vs mutant)

The repository is structured for production use with deterministic workflow steps, job-scoped output directories, and strict separation of source code vs runtime state.

What MutationScan Does

MutationScan executes a staged workflow:

Sequence extraction and variant calling from local .fna genomes
Biochemical scoring and co-occurrence epistasis network generation
Optional biophysics docking against a provided protein structure

Current design principle:

Local genomes are the input source (no built-in metadata download stage in the production DAG).
Every run is namespaced by job_name and writes to data/output/{job_name}/.

Recent Updates (March 2026)

The current main branch includes several pipeline correctness and quality upgrades:

Variant-calling identity filter to suppress weak-homology mutation inflation.
MVBM docking refinements with fixed-pocket targeting and flexible-residue mutant docking.
Fast steric quality control with explicit FAILED_QC status for non-physical mutant models.
Confidence and interpretation annotations in biophysics outputs for easier triage.

These updates are now the documented baseline behavior for new runs.

Production Workflow (Snakemake)

The active workflow in Snakefile calls exactly these scripts:

No legacy acquisition script is used in the current production DAG.

Inputs and Outputs

Required inputs:

Local genomes directory (default data/local_genomes)
Target gene list (default config/acr_targets.txt)
Optional reference PDB for Phase 3 biophysics (default data/5o66.pdb in config)
Optional ligand path from config (ligand)

Primary outputs for a run:

data/output/{job_name}/1_genomics_report.csv
data/output/{job_name}/2_epistasis_networks.csv
data/output/{job_name}/ControlScan_Networks/
data/output/{job_name}/3_biophysics_docking.csv
data/output/{job_name}/Mutated_Structures/
data/output/{job_name}/README_Biophysics.txt

Configuration

Edit config/config.yaml to control run behavior.

Minimum important keys:

job_name: output namespace for this run
local_genomes: folder containing .fna files
targets_file: target genes list
variant_min_identity_percent: minimum alignment identity threshold for variant emission (default 80)
default_pdb: structure file for biophysics stage
ligand: optional ligand file path for docking
pocket_center_x/pocket_center_y/pocket_center_z: optional override for docking pocket center (default AcrB center)
exhaustiveness: docking search exhaustiveness (default 16)

Example:

job_name: "trial_001"
local_genomes: "data/local_genomes"
targets_file: "config/acr_targets.txt"
variant_min_identity_percent: 80
default_pdb: "data/5o66.pdb"
ligand: "data/ligands/ligand.sdf"
exhaustiveness: 16

Identity filtering note:

Alignments below variant_min_identity_percent are skipped before mutation emission.
If you want a broader but noisier search, reduce to 75; for stricter calls, keep 80 or raise it.

Quick Start

Option A: Local environment

Use the project Conda environment definition:

conda env create -f environment.yml
conda activate mutationscan
pip install -e .

Dry-run the DAG:

python -m snakemake -n --cores 1 --config job_name="smoke_test"

Run the workflow:

python -m snakemake --cores 4 --config job_name="run_2026_03_18"

Option B: Docker

docker compose build
docker compose run --rm mutationscan python -m snakemake -n --cores 1 --config job_name="docker_smoke"
docker compose run --rm mutationscan python -m snakemake --cores 4 --config job_name="docker_run"

CI/CD Notes

Repository CI validates:

Unit tests
Snakemake DAG buildability

Runtime data/state folders are intentionally quarantined via ignore rules, and .snakemake/ is not tracked.

Scientific and Operational Disclaimers

This pipeline is intended for research and engineering triage workflows.

Not a clinical diagnostic device.
Mutation-to-phenotype inference is model- and rule-dependent, not ground truth.
Docking outputs are best-effort relative estimates, not absolute binding free-energy truth.
Fast local docking does not fully model large conformational changes, explicit solvent, long-timescale dynamics, or complete thermodynamic integration.
For high-confidence mechanistic conclusions, use full molecular dynamics and dedicated free-energy methods.

Repository Hygiene Policy

Tracked assets should remain source/config/documentation only.

Not shipped as production code or tracked outputs:

.snakemake/ runtime state
Generated output under data/output/*
Downloaded genome payloads under data/local_genomes/*
Ad hoc local experiment files

Keep placeholders only (.gitkeep) in runtime data folders.

Troubleshooting

Common causes of failed runs:

Missing .fna files in local_genomes
Missing/incorrect target genes file
Missing PDB when biophysics stage is enabled
Missing external binaries in local environment (tblastn, docking dependencies)

Recommended first check:

python -m snakemake -n --cores 1 --config job_name="debug_run"

License

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
.github/workflows		.github/workflows
config		config
data		data
src		src
tests		tests
utility scripts		utility scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
TOOLKIT_USAGE.md		TOOLKIT_USAGE.md
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MutationScan

What MutationScan Does

Recent Updates (March 2026)

Production Workflow (Snakemake)

Inputs and Outputs

Configuration

Quick Start

Option A: Local environment

Option B: Docker

CI/CD Notes

Scientific and Operational Disclaimers

Repository Hygiene Policy

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MutationScan

What MutationScan Does

Recent Updates (March 2026)

Production Workflow (Snakemake)

Inputs and Outputs

Configuration

Quick Start

Option A: Local environment

Option B: Docker

CI/CD Notes

Scientific and Operational Disclaimers

Repository Hygiene Policy

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages