Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
581b8e9
plan for COMBINE
ntalluri Aug 12, 2025
01d9a45
delete the docs line
ntalluri Aug 13, 2025
87c001a
start of organizing tutorial
ntalluri Aug 18, 2025
aaa9098
new updates on intro
ntalluri Aug 21, 2025
ce8d8f8
clean up
ntalluri Aug 21, 2025
24040e9
added in a couple of steps for basic
ntalluri Aug 22, 2025
83d175d
more updates
ntalluri Aug 25, 2025
0cdf615
more ideas
ntalluri Aug 26, 2025
144fc68
more additions for the configs
ntalluri Sep 2, 2025
f55a101
restructure and working on beginner
ntalluri Sep 2, 2025
85e91d5
moving things around
ntalluri Sep 2, 2025
9acc8f2
wrote up step 1 and step 2
ntalluri Sep 3, 2025
54a1be5
update config and clean up headers
ntalluri Sep 3, 2025
c20f634
plan for step 3
ntalluri Sep 3, 2025
cf434b6
updated introduction and added more to beginner
ntalluri Sep 4, 2025
5111e4a
remove config/beginner.yaml
ntalluri Sep 8, 2025
25bc2cf
Merge branch 'main' of github.com:ntalluri/spras into tutorial
ntalluri Sep 8, 2025
2cbdbb4
finished beginner
ntalluri Sep 9, 2025
0cb8e9b
refactor
ntalluri Sep 9, 2025
95943b9
Merge branch 'main' of github.com:ntalluri/spras into tutorial
ntalluri Oct 2, 2025
6289a56
updated the intermediate tutorial
ntalluri Oct 8, 2025
7c5fdd1
reordered beginner and updated files
ntalluri Oct 9, 2025
e144399
updated more
ntalluri Oct 9, 2025
35bb450
updated intermediate tutorial more and the beginner a little
ntalluri Oct 10, 2025
c325d36
update wording in beginner'
ntalluri Oct 10, 2025
9f431ce
updating intermediate
ntalluri Oct 13, 2025
a333ea8
Update docs/_static/config/intermediate.yaml to be include:false for ml
ntalluri Oct 13, 2025
5219a62
Update docs/tutorial/introduction.rst
ntalluri Oct 13, 2025
3b376f2
Apply suggestions from code review
ntalluri Oct 13, 2025
220954d
Update docs/tutorial/introduction.rst
ntalluri Oct 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions docs/_static/config/beginner.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
hash_length: 7
container_framework: docker
unpack_singularity: false
container_registry:
base_url: docker.io
owner: reedcompbio

# Each algorithm has an 'include' parameter. By toggling 'include' to true/false the user can change
# which algorithms are run in a given experiment.
#
# algorithm-specific parameters are embedded in lists so that users can specify multiple. If multiple
# parameters are specified then the algorithm will be run as many times as needed to cover all parameter
# combinations. For instance if we have the following:
# - name: "myAlg"
# params:
# include: true
# a: [1,2]
# b: [0.5,0.75]
#
# then myAlg will be run on (a=1,b=0.5),(a=1,b=0.75),(a=2,b=0.5), and (a=2,b=0,75). Pretty neat, but be
# careful: too many parameters might make your runs take a long time.

algorithms:
- name: "pathlinker"
params:
include: true
run1:
k: 1
# run2: # uncomment for step 3.2
# k: [10, 100] # uncomment for step 3.2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indented one in too far

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still indented too much? It looks unaligned in GitHub.

# Here we specify which pathways to run and other file location information.
# Assume that if a dataset label does not change, the lists of associated input files do not change
datasets:
- # Labels can only contain letters, numbers, or underscores
label: egfr
node_files: ["tps-egfr-prizes.txt"] # the input nodes
edge_files: ["phosphosite-irefindex13.0-uniprot.txt"] # the interactome
# # Placeholder
other_files: []
# Relative path from the spras repository root directory where these files live
data_dir: "input"

reconstruction_settings:

# Set where everything is saved
locations:
reconstruction_dir: "output/basic"

analysis:
# Create one summary per pathway file and a single summary table for all pathways for each dataset
summary:
include: false # set to true for step 3.3
# Create Cytoscape session file with all pathway graphs for each dataset
cytoscape:
include: false # set to true for step 3.3
132 changes: 132 additions & 0 deletions docs/_static/config/intermediate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
hash_length: 7
container_framework: docker
unpack_singularity: false
container_registry:
base_url: docker.io
owner: reedcompbio

# Each algorithm has an 'include' parameter. By toggling 'include' to true/false the user can change
# which algorithms are run in a given experiment.
#
# algorithm-specific parameters are embedded in lists so that users can specify multiple. If multiple
# parameters are specified then the algorithm will be run as many times as needed to cover all parameter
# combinations. For instance if we have the following:
# - name: "myAlg"
# params:
# include: true
# a: [1,2]
# b: [0.5,0.75]
#
# then myAlg will be run on (a=1,b=0.5),(a=1,b=0.75),(a=2,b=0.5), and (a=2,b=0,75). Pretty neat, but be
# careful: too many parameters might make your runs take a long time.

algorithms:
- name: "pathlinker"
params:
include: true
run1:
k: 1
run2:
k: [10, 100]
- name: omicsintegrator1
params:
include: true
run1:
b: [0.55, 2, 10]
d: 10
g: 1e-3
r: 0.01
w: 0.1
mu: 0.008
- name: omicsintegrator2
params:
include: true
run1:
b: 4
g: 0
run2:
b: 2
g: 3
- name: meo
params:
include: true
run1:
local_search: ["Yes", "No"]
max_path_length: [2, 3]
rand_restarts: 10
- name: allpairs
params:
include: true
- name: domino
params:
include: true
run1:
slice_threshold: 0.3
module_threshold: 0.05
- name: mincostflow
params:
include: true
run1:
capacity: 15
flow: 80
run2:
capacity: 1
flow: 6
run3:
capacity: 5
flow: 60
- name: "strwr"
params:
include: true
run1:
alpha: [0.85]
threshold: [100, 200]
- name: "rwr"
params:
include: true
run1:
alpha: [0.85]
threshold: [100, 200]

# Here we specify which pathways to run and other file location information.
# Assume that if a dataset label does not change, the lists of associated input files do not change
datasets: # TODO update this based on the dataset that I set up
- # Labels can only contain letters, numbers, or underscores
label: egfr
node_files: ["tps-egfr-prizes.txt"] # the input nodes
edge_files: ["phosphosite-irefindex13.0-uniprot.txt"] # the interactome
# Placeholder
other_files: []
# Relative path from the spras directory where these files live
data_dir: "input"

reconstruction_settings:

# Set where everything is saved
locations:
reconstruction_dir: "output/intermediate"

analysis:
# Machine learning analysis (e.g. clustering) of the pathway output files for each dataset
ml:
# ml analysis per dataset
include: false # set to true for step 3
# adds ml analysis per algorithm output
# only runs for algorithms with multiple parameter combinations chosen
aggregate_per_algorithm: false
# specify how many principal components to calculate
components: 2
# boolean to show the labels on the pca graph
labels: true
# 'ward', 'complete', 'average', 'single'
# if linkage: ward, must use metric: euclidean
linkage: 'ward'
# 'euclidean', 'manhattan', 'cosine'
metric: 'euclidean'
# controls whether kernel density estimation (KDE) is computed and visualized on top of PCA plots.
# the coordinates of the KDE maximum (kde_peak) are also saved to the PCA coordinates output file.
# KDE needs to be run in order to select a parameter combination with PCA because the maximum kernel density is used
# to pick the 'best' parameter combination.
kde: false
# removes empty pathways from consideration in ml analysis (pca only)
remove_empty_pathways: false
Binary file added docs/_static/images/100_pathway.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/10_pathway.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/1_pathway.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/cytoscape-open-cys-file.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/cytoscape-opened.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/cytoscape_upload_network.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/hac-horizontal.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/hac-vertical.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/jaccard-heatmap.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/pca.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/summary-stats.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,15 @@ methods (PRMs) to omics data.
contributing/index
contributing/maintain

.. toctree::
:maxdepth: 1
:caption: Tutorials

tutorial/introduction
tutorial/beginner
tutorial/intermediate
tutorial/advanced

Indices and tables
==================

Expand Down
31 changes: 31 additions & 0 deletions docs/tutorial/advanced.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
Advanced Capabilities and Features
======================================

More like these are all the things we can do with this, but will not be showing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this can be a list more or less of things SPRAS can do. The beginner and intermediate steps will already take plenty of time.


- mention parameter tuning
- say that parameters are not preset and need to be tuned for each dataset

CHTC integration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHTC is local to our university. The way to say it may be Snakemake integration with cloud and high-throughput computing resources, which we've prototyped in our local cluster. If we start testing in OSG that would be different because many people are eligible for accounts.


Anything not included in the config file

1. Global Workflow Control

Sets options that apply to the entire workflow.

- Examples: the container framework (docker, singularity, dsub) and where to pull container images from

running spras with multiple parameter combinations with multiple algorithms on multiple Datasets
- for the tutorial we are only doing one dataset

4. Gold Standards

Defines the input files SPRAS will use to evaluate output subnetworks

A gold standard dataset is comprised of:

- a label: defines the name of the gold standard dataset
- node_file or edge_file: a list of either node files or edge files. Only one or the other can exist in a single dataset. At the moment only one edge or one node file can exist in one dataset
- data_dir: the path to where the input gold standard files live
- dataset_labels: a list of dataset labels that link each gold standard links to one or more datasets via the dataset labels
Loading
Loading