MOKA implements a Snakemake pipeline to automate data bridge kernel-based association tests.
This pipeline offers flexibility of GWAS analysis & visualizations with different multi-omics variant-specific weights.
- GWAS genotype files in PLINK format (
.bed,.bim,.fam) [!required] - Variant-specific weights in CSV format (SNP_ID,Chromosome,Position,Weight) [!required for moka]
- Gene regions file (provided in GRCh38 or hg38) , generated from Ensembl GFF3 annotations (e.g., Homo_sapiens.GRCh38.115.gff3.gz, Ensembl Release 115).
- DisGeNET gene disease database reference file ( If disease external validation needed)
Follow the Snakemake Installation Guide:
conda create -c conda-forge -c bioconda -c nodefaults -n snakemake snakemake
conda activate snakemake
snakemake --helpgit clone https://github.com/davidenoma/moka
cd mokaThis configuration controls paths to inputs and analysis settings:
- genotype_prefix: Prefix for the genotype data files (without file extensions).
- weights_type: Text descriptor indicating the source or type of biologically informed functional weights used in the analysis.
- genotype_file_path: Directory path to the genotype data files.
- weight_file: File path to the functional weight file applied in the association tests.
- disgenet_reference_file: Path to the DisGeNET reference file containing disease-specific gene–disease associations https://disgenet.org [For gene disease associations only!].
- spectral decomposition: Boolean flag to enable spectral (eigenvalue) decomposition and transformation of genotype and phenotype matrices. Default: TRUE.
- is_binary: Boolean flag specifying the phenotype type (TRUE for binary traits; FALSE for quantitative traits). Default: TRUE.
- Plink: Path to the PLINK executable, e.g."~/software/plink".
We provide a demo example with configuration located at ./config/config.yaml. The pipeline executes the following steps:
Step 1. Kernel-based association testing
- Integrates GWAS genotype data with the provided weights.
- For the weight file, supports diverse data sources derived SNP-level weights.
- Performs SNP-set kernel-based association tests to model the joint effect of multiple variants.
- Execute one chromosome at a time.
- Optionally applies decorrelation to account for population structure or relatedness.
Note: The initial run would take some time as the software installs the core dependencies requires from workflow/envs/moka.yaml.
Input: Preprocessed genotype data (./genotype_data/test_geno.fam test_geno.bim test_geno.bed ) and weight files (./weights/test_geno_weights.csv).
Output: Results of association tests under ./result_folder/
snakemake --cores <num_cores> --use-conda<num_cores> are the number of cores to use e.g. 8
Step 2. Merge results from all chromosomes to a single file
Input: Individual association test results.
Output: Merged association test results under ./result_folder/
snakemake --cores 1 merge_moka_results --use-condaStep 3. Functional enrichment analysis
- Significant genes from association testing are assessed for:
- KEGG pathway enrichment
- Gene Ontology (GO) enrichment
- Helps identify biological processes and pathways underlying GWAS signals.
Input: Merged association test results.
Output: GO analysis results under ./output_plots/
snakemake --cores 1 go_analysis --use-condaOutput: KEGG pathway analysis results under ./output_plots/
snakemake --cores 1 kegg_pathway_analysis --use-condaStep 4. Visualization
- Generates publication-ready visual summaries: Manhattan plots
Input: Merged association test results.
Output: Manhattan plots with visual representations of association test results under ./output_plots/
snakemake --cores 1 manhattan_plots --use-condaStep 5. External validation
- Cross-references significant genes against the DisGeNET database.
- Reports a validation ratio of overlapping associated genes, strengthening interpretation of GWAS findings.
Input: Merged association test results.
Output: Annotated association test results with DisGeNet database under ./output_plots/
snakemake --cores 1 disgenet_annotation_005 --use-condaAdditional function: generate_gene_regions
- Generates gene region files with specified flanking size from GFF3 annotation for gene-based association testing. Uses
config.flank_sizefrom the config file.
How to execute:
snakemake --cores 1 generate_gene_regions --use-condaAll required Python and R packages, as well as other software dependencies, are specified in the provided envs.yaml files in the workflow/envs/ directory. These environments are automatically created and managed by Snakemake when you use the --use-conda flag. You do not need to install packages manually.
For more details, see the environment YAML files in workflow/envs/.
You much lift over to GRCh38 format check here: Liftover GWAS: [https://github.com/davidenoma/LiftOver]
For more information on the MOKA pipeline and its usage, refer to the documentation provided in the repository or contact the project maintainers. david.enoma@ucalgary.ca
MOKA: A pipeline for multi-omics bridged SNP-set kernel association test https://www.medrxiv.org/content/10.1101/2025.07.06.25330974v1
You can run the MOKA pipeline using the official Docker image:
docker pull davidenoma/moka-gwasFor usage instructions and examples, see https://hub.docker.com/r/davidenoma/moka-gwas.
See docs/README_Snakemake_ARC.md for detailed setup
instructions and ready-to-use scripts (run_smk.sh, run_smk.sbatch)
to run Snakemake safely on clusters without admin privileges.

