-
Notifications
You must be signed in to change notification settings - Fork 25
docs: tutorials for COMBINE25 #369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Documentation build overview
Show files changed (14 files in total): 📝 7 modified | ➕ 4 added | ➖ 3 deleted
|
agitter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good outline. I added some detailed thoughts.
|
We will also need to write an abstract |
docs/tutorial/introduction.rst
Outdated
| Pathway reconstruction is a computational approach used in biology to rebuild biological pathways (such as signaling pathways) from high-throughput experimental data. | ||
|
|
||
| Curated pathway databases provide references to pathways, but they are often generalized and may not capture the context-specific details relevant to a particular disease or experimental condition. | ||
| To address this, pathway reconstruction algorithms (PRAs) help map molecules of interest (such as proteins, genes, or metabolites identified in omics experiments or that are known as points of reference) onto large-scale interaction networks, called interactomes (maps of molecular interactions in a cell). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to avoid the PRA acronym because it is not generally used
docs/tutorial/basic.rst
Outdated
| - Bow Tie Builder | ||
| - ResponseNet | ||
|
|
||
| - Each algorithm has an include flag (true/false) to turn it on or off. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to our docs about these here?
docs/tutorial/basic.rst
Outdated
| - data_dir: the path to where the input dataset files live | ||
| - other_files: a placefolder for potential need for future delevvelopment | ||
|
|
||
| 4. Gold Standards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip gold standards for the basic intro and introduce in medium?
docs/tutorial/basic.rst
Outdated
| - Defines the filepath where reconstructed networks are saved (output directory by default) | ||
| - Basic housekeeping for how SPRAS organizes and stores results. | ||
|
|
||
| 6. Analysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much of this do we cover here versus skip until medium? We may not need to explain everything that goes in the config file all at once.
docs/tutorial/basic.rst
Outdated
| - egfr | ||
| - one algorithm | ||
| - three different preset combos | ||
| - have them make the configuration file? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that is necessary. The basic tutorial can have them start with an premade config, maybe modify it trivially, and make sure they understand what it did. A powerful example would be to run it, add one extra parameter, and run it again to see how much is cached.
agitter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this style of explaining what each command does and showing the file tree produced.
Once beginner is done, try practicing it live. This first tutorial may end up being mostly beginner content, which is okay.
| - mention parameter tuning | ||
| - say that parameters are not preset and need to be tuned for each dataset | ||
|
|
||
| CHTC integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHTC is local to our university. The way to say it may be Snakemake integration with cloud and high-throughput computing resources, which we've prototyped in our local cluster. If we start testing in OSG that would be different because many people are eligible for accounts.
docs/tutorial/beginner.rst
Outdated
|
|
||
| Stores all results generated by SPRAS. Subfolders are created automatically for each run, and their structure can be controlled through the configuration file. | ||
|
|
||
| By default, the directories are set to config/, input/, and output/. The config/, input/, and output/ folders can be placed anywhere within the SPRAS repository. Their input/ and output/ locations can be updated in the configuration file, and the configuration file itself can be found by providing its path when running SPRAS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we place these directories anywhere? They don't have to be subdirectories anymore, do they? Do absolute paths work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never tried an absolute path to put them anywhere. I thought it was forced to be within spras, but I'm not sure anymore. I'll test it out.
docs/tutorial/beginner.rst
Outdated
|
|
||
| 4. Organizing results with parameter hashes | ||
|
|
||
| Each dataset–algorithm–parameter combination is placed in its own folder named like egfr-pathlinker-params-D4TUKMX/. D4TUKMX is a hash that uniquely identifies the specific parameter combination (k = 10 here). A matching log file in logs/parameters-pathlinker-params-D4TUKMX.yaml records the exact parameter values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Each dataset–algorithm–parameter combination is placed in its own folder named like egfr-pathlinker-params-D4TUKMX/. D4TUKMX is a hash that uniquely identifies the specific parameter combination (k = 10 here). A matching log file in logs/parameters-pathlinker-params-D4TUKMX.yaml records the exact parameter values. | |
| Each dataset–algorithm–parameter combination is placed in its own folder named like egfr-pathlinker-params-D4TUKMX/. D4TUKMX is a hash that uniquely identifies the specific parameter combination (k = 10 here). A matching log file in basic/logs/parameters-pathlinker-params-D4TUKMX.yaml records the exact parameter values. |
docs/tutorial/beginner.rst
Outdated
|
|
||
| 2. Organizing outputs per parameter combination | ||
|
|
||
| Each new dataset–algorithm–parameter combination gets its own folder (e.g egfr-pathlinker-params-7S4SLU6/ and egfr-pathlinker-params-VQL7BDZ/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the parameters that ran
docs/tutorial/beginner.rst
Outdated
|
|
||
| 2. Running the summary analysis | ||
|
|
||
| SPRAS aggregates the pathway.txt files from all selected parameter combinations into a single summary table. This table reports key graph-based statistics for each pathway, including: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SPRAS aggregates the pathway.txt files from all selected parameter combinations into a single summary table. This table reports key graph-based statistics for each pathway, including: | |
| SPRAS aggregates the pathway.txt files from all selected parameter combinations per dataset into a single summary table. This table reports key graph topological statistics for each pathway, including: |
docs/tutorial/beginner.rst
Outdated
|
|
||
| 3. Running the Cytoscape analysis | ||
|
|
||
| All pathway.txt files from the chosen parameter combinations are collected and passed into the Cytoscape Docker image. A Cytoscape session file is then generated, containing visualizations for each pathway. This file is saved as egfr-cytoscape.cys and can be opened in Cytoscape for interactive exploration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| All pathway.txt files from the chosen parameter combinations are collected and passed into the Cytoscape Docker image. A Cytoscape session file is then generated, containing visualizations for each pathway. This file is saved as egfr-cytoscape.cys and can be opened in Cytoscape for interactive exploration. | |
| All pathway.txt files from the given parameter combinations for a specific dataset which collected and passed into the Cytoscape Docker image. A Cytoscape session file is then generated, containing visualizations for each pathway. This file is saved as egfr-cytoscape.cys and can be opened in Cytoscape for interactive exploration. |
docs/tutorial/intermediate.rst
Outdated
| - Domino | ||
| - Source-Targets Random Walk with Restarts | ||
| - Random Walk with Restarts | ||
| - BowTieBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bowtiebuilder on the egfr data takes forever, I just ran it and it been 2 hours and it still is not done.
agitter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our goal is to have a version of this merged very soon so that Neha can test it on Monday. If there isn't time to address all of my comments, they can be addressed in a follow up pull request.
| k: 1 | ||
| # run2: # uncomment for step 3.2 | ||
| # k: [10, 100] # uncomment for step 3.2 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still indented too much? It looks unaligned in GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to move these planning notes before merging? Or keep them here temporarily and delete them in the pre-COMBINE follow up pull request?
|
|
||
| Required knowledge: | ||
|
|
||
| - Basic Python skills |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will participants be coding in Python? Or is it the ability to edit the yaml files and run command line tools?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ability to edit the yaml files and run command line tools
| What Happens When You Run This Command | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| SPRAS will run more slowly than the beginner.yaml configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will take longer to run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
| Advanced Capabilities and Features | ||
| ====================================== | ||
|
|
||
| More like these are all the things we can do with this, but will not be showing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this can be a list more or less of things SPRAS can do. The beginner and intermediate steps will already take plenty of time.
Co-authored-by: Anthony Gitter <agitter@users.noreply.github.com>
| =============================== | ||
| Pathway reconstruction algorithms allow researchers to systematically find context-specific subnetworks without performing exhaustive experiments. Different algorithms use distinct computational strategies and parameters, providing flexibility to highlight various aspects of the underlying biology and generate new, testable hypotheses giving researchers the flexibility to create and identify different subnetworks specific to their experimental conditions. | ||
|
|
||
| What is SPRAS? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do better to explain: do we need to state somewhere what SPRAS does at a high level? It takes input node information and a networks, runs one or more algorithms with one or parameter combinations, etc. A newcomer may not know what we mean yet by algorithms and datasets in this sentence.
docs/tutorial/beginner.rst
Outdated
|
|
||
| 5. Running the algorithm | ||
|
|
||
| SPRAS launches the PathLinker Docker image, sending it the prepared files and parameter settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to the docker image on docker hub
Co-authored-by: Anthony Gitter <agitter@users.noreply.github.com>
agitter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving the temporary version for the demo
This tutorial and registration for the conference is due September 28, 2025
Event: https://co.mbine.org/events/
Example tutorials: https://co.mbine.org/author/combine-2023/