Nextflow Pipeline for Running and Evaluating Multiple moDiNA Configurations

This pipeline enables running multiple moDiNA configurations in an reproducible and scalable environment and evaluating configurations using different schemes.

If you want to use the Python package, please refer to this repository.

A detailed documentation of the individual steps is provided at our readthedocs website: https://dyhealthnet.github.io/moDiNA/index.html.

Installation

If you are new to Nextflow, please refer to this page: https://nf-co.re/docs/usage/installation

Then clone the moDiNA Nextflow pipeline repository:

git clone https://github.com/DyHealthNet/moDiNA_nf_pipeline.git

TODO: create conda environments if this is not working automatically...

Parameters

The Nextflow pipeline is designed to run multiple configurations in parallel, allowing users to systematically evaluate different settings and their impact on the results. To specify the configurations, you can either change the parameters in the nextflow.config file or create a configuration file (e.g., params.yml).

General Parameters

Parameter	Type	Default	Description
name_context_1	string	context1	Label assigned to the first context (e.g. Female or Control).
name_context_2	string	context2	Label assigned to the second context (e.g. Male or Disease).
run_type	string	file	Determines how moDiNA configurations are specified. One of single (one fixed configuration), all (all valid combinations), or file (read from a CSV samplesheet).
data_type	string	simulation	Source of input data. Either simulation (generate synthetic data via copula) or real (provide real-world data files).
out_dir	path	''	Absolute path to the directory where all pipeline outputs are written.

Environment Parameters

Parameter	Type	Description
conda_modina_env	path	Absolute path to the conda environment used for all core moDiNA processes (network inference, ranking, …).
conda_eval_env	path	Absolute path to the conda environment used for R-based evaluation process.

Simulation Parameters (simulation.*)

Parameter	Type	Default	Description
`simulation.n_simulations`	`integer`	`3`	Number of independent simulation replicates to run.
`simulation.n_bi`	`integer`	`50`	Number of binary nodes to simulate.
`simulation.n_cont`	`integer`	`50`	Number of continuous nodes to simulate.
`simulation.n_cat`	`integer`	`50`	Number of categorical nodes to simulate.
`simulation.n_samples`	`integer`	`500`	Number of samples (observations) per context.
`simulation.n_shift_cont`	`integer`	`4`	Number of continuous nodes with an artificially introduced mean shift between contexts.
`simulation.n_shift_bi`	`integer`	`4`	Number of binary nodes with an artificially introduced mean shift between contexts.
`simulation.n_shift_cat`	`integer`	`4`	Number of categorical nodes with an artificially introduced mean shift between contexts.
`simulation.n_corr_cont_cont`	`integer`	`2`	Number of continuous–continuous node pairs with an artificially introduced correlation difference between contexts.
`simulation.n_corr_bi_bi`	`integer`	`2`	Number of binary–binary node pairs with an artificially introduced correlation difference between contexts.
`simulation.n_corr_cat_cat`	`integer`	`2`	Number of categorical–categorical node pairs with an artificially introduced correlation difference between contexts.
`simulation.n_corr_bi_cont`	`integer`	`2`	Number of binary–continuous node pairs with an artificially introduced correlation difference between contexts.
`simulation.n_corr_bi_cat`	`integer`	`2`	Number of binary–categorical node pairs with an artificially introduced correlation difference between contexts.
`simulation.n_corr_cont_cat`	`integer`	`2`	Number of continuous–categorical node pairs with an artificially introduced correlation difference between contexts.
`simulation.n_both_cont_cont`	`integer`	`2`	Number of continuous–continuous node pairs with both a mean shift and a correlation difference.
`simulation.n_both_bi_bi`	`integer`	`2`	Number of binary–binary node pairs with both a mean shift and a correlation difference.
`simulation.n_both_cat_cat`	`integer`	`2`	Number of categorical–categorical node pairs with both a mean shift and a correlation difference.
`simulation.n_both_bi_cont`	`integer`	`2`	Number of binary–continuous node pairs with both a mean shift and a correlation difference.
`simulation.n_both_bi_cat`	`integer`	`2`	Number of binary–categorical node pairs with both a mean shift and a correlation difference.
`simulation.n_both_cont_cat`	`integer`	`2`	Number of continuous–categorical node pairs with both a mean shift and a correlation difference.
`simulation.shift`	`float`	`0.5`	Magnitude of the mean shift introduced in shifted nodes. Must be ≥ 0.
`simulation.corr`	`float`	`0.7`	Target correlation difference introduced in correlated node pairs. Must be in [0, 1].

Real-World Data Parameters (`real_world_data.*`)

Only required when data_type = 'real'.

Parameter	Type	Default	Description
`real_world_data.path_context_1`	`path`	`''`	Absolute path to the CSV data file for context 1 (rows: samples, columns: variables).
`real_world_data.path_context_2`	`path`	`''`	Absolute path to the CSV data file for context 2 (rows: samples, columns: variables).
`real_world_data.path_meta`	`path`	`''`	Absolute path to the metadata CSV file containing variable labels and data types.

Differential Network Analysis Parameters (`diff_net_analysis.*`)

Parameter	Type	Default	Description
`diff_net_analysis.path_config`	`path`	`'input_samplesheet.csv'`	Path to the CSV samplesheet listing moDiNA configurations to run. Required when `run_type = 'file'`. Columns: `node_metric`, `edge_metric`, `ranking_algorithm`.
`diff_net_analysis.node_metric`	`string`	`'STC'`	Node-level differential metric. Used when `run_type = 'single'`. One of: `STC`, `DC-P`, `DC-E`, `WDC-P`, `WDC-E`, `PRC-P`, `None`.
`diff_net_analysis.edge_metric`	`string`	`'pre-LS'`	Edge-level differential metric. Used when `run_type = 'single'`. One of: `pre-P`, `pre-E`, `post-E`, `post-P`, `int-IS`, `pre-LS`, `post-LS`, `pre-PE`, `post-PE`, `None`.
`diff_net_analysis.ranking_algorithm`	`string`	`'PageRank+'`	Algorithm used to rank nodes and edges in the differential network. Used when `run_type = 'single'`. One of: `PageRank+`, `PageRank`, `absDimontRank`, `DimontRank`, `direct_node`, `direct_edge`.
`diff_net_analysis.filter_method`	`string`	`''`	Optional method to filter context-specific networks before differential analysis. One of: `degree`, `density`, `quantile`. Leave empty to skip filtering.
`diff_net_analysis.filter_param`	`integer / float`	`2`	Parameter for the chosen filter method. Integer for `degree`; float in (0, 1) for `density` and `quantile`.
`diff_net_analysis.filter_metric`	`string`	`'pre-P'`	Edge score used as the basis for filtering. One of: `pre-P`, `pre-E`. Required when `filter_method` is set.
`diff_net_analysis.filter_rule`	`string`	`'zero'`	Rule applied when filtering edges. One of: `zero` (set filtered edges to zero), `union` (keep edges present in at least one context).
`diff_net_analysis.max_path_length`	`integer`	`2`	Maximum path length used for the `int-IS` edge metric. Must be an integer in [0, 4]. Only relevant when `int-IS` is selected as an edge metric.
`diff_net_analysis.test_type`	`string`	`'nonparametric'`	Type of statistical test used during context network inference. One of: `parametric`, `nonparametric`.
`diff_net_analysis.nan_value`	`integer`	`-89`	Sentinel value in input data that represents missing / NA values.
`diff_net_analysis.multiple_testing`	`string`	`'bh'`	Multiple testing correction method applied to association p-values. One of: `bh` (Benjamini–Hochberg), `by` (Benjamini–Yekutieli).

Usage

You can choose between the following profiles: conda, docker, singularity, slurm. The respective profile will be used to execute the pipeline with the appropriate containerization or job scheduling system. You can specify the profile using the -profile flag when running the Nextflow pipeline. For example, to run the pipeline using conda for package management and SLURM for distributed execution, you would use the following command:

Now, you can run the Nextflow pipeline using the following command:

nextflow run main.nf -profile conda,slurm

If you have stored the parameters in a params.yml file instead of the nextflow.config file, you can specify the path to this file using the -params-file flag:

nextflow run main.nf -profile conda,slurm -params-file path/to/your/params.yml

Pipeline Output

The results are stored in the specified output directory, which is organized by differential network analysis steps and evaluation steps.

The main output of the pipeline includes: - Simulated datasets for each simulation replicate (if data_type = 'simulation'). - Inferred context-specific networks for each context and simulation. - Node and edge metrics files for each configuration and simulation. - Ranking results for each configuration and simulation. - Evaluation results

Citation

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
bin		bin
envs		envs
modules		modules
params_files		params_files
subworkflows		subworkflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
input_samplesheet.csv		input_samplesheet.csv
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nextflow Pipeline for Running and Evaluating Multiple moDiNA Configurations

Installation

Parameters

General Parameters

Environment Parameters

Simulation Parameters (simulation.*)

Real-World Data Parameters (`real_world_data.*`)

Differential Network Analysis Parameters (`diff_net_analysis.*`)

Usage

Pipeline Output

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Nextflow Pipeline for Running and Evaluating Multiple moDiNA Configurations

Installation

Parameters

General Parameters

Environment Parameters

Simulation Parameters (simulation.*)

Real-World Data Parameters (real_world_data.*)

Differential Network Analysis Parameters (diff_net_analysis.*)

Usage

Pipeline Output

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Real-World Data Parameters (`real_world_data.*`)

Differential Network Analysis Parameters (`diff_net_analysis.*`)

Packages