-
Notifications
You must be signed in to change notification settings - Fork 5
AddedDocNoteForLookupTables #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
zimingz
wants to merge
112
commits into
zimingz:master
Choose a base branch
from
Townsend-Lab-Yale:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Merge preprocessing branch w/ master
Modified clustersubseq to perform over 4 different flag seqs, which are keys to scaled syn/rep
Updates: MASS-PRF was written in the standard C++ programming language. The software package is accompanied by a manual document, example data, source codes, and compiled executable commands for Windows/Linux/Mac. Source codes are released to GPLv3, and can be downloaded from https://github.com/Townsend-Lab-Yale/MASSPRF_10July2016/.
-SCL 0 for no scaling -SCL 1 for automatic scaling -SCL <num> for user defined scaling
The thresholds of gamma thresholds for positive and negative selections.
Updated compiling command without 'make'; put the C++ version and thread details in.
Added the information for MASSPRF pipeline folder, and simulation folder.
Update PRFCluster.cpp
Update PRFCluster.h
Change 1800 to 2700 to adjust to a min_scale_length of 900
…-values of the 2x2 contingency tables.
added under the session 'How to run MASS-PRF?': - "Note: The program requires the four LookupTables in the working directory to run successfully, for more details about the LookupTables, please refer to the session ‘Gamma calculation using four Lookup Tables’ below."
Silent clustering -s 1 is the default option.
-s 1 is the default, the default is clustering silent sites.
Revised README.md to include detailed installation and setup instructions This update improves the README file by integrating comprehensive installation steps, usage guidance, and symlink setup for the MASS-PRF pipeline. It consolidates information from existing documentation to ensure users have a streamlined and clear setup process.
1. Update Time Output Format to Include Leading Zeros
Original Code:
int h = t / 3600, m = (t % 3600) / 60, s = t % 60;
if (h) cout << h << ":" << m << ":" << s << endl;
else cout << m << ":" << s << endl;
Modified Code:
int h = t / 3600, m = (t % 3600) / 60, s = t % 60;
if (h)
cout << setw(2) << setfill('0') << h << ":"
<< setw(2) << setfill('0') << m << ":"
<< setw(2) << setfill('0') << s << endl;
else
cout << setw(2) << setfill('0') << m << ":"
<< setw(2) << setfill('0') << s << endl;
These scripts were originally developed by Prof. Nic Fisk, Associate Professor at URI and a friend of the lab, to map selection pressures onto 3D protein structures. They were added to MASSPRF, with his permission, to facilitate 3D visualization of selection mapping.
These scripts were originally developed by Prof. Nic Fisk, Associate Professor at URI and a friend of the lab, to map selection pressures onto 3D protein structures. They were added to MASSPRF, with his permission, to facilitate 3D visualization of selection mapping.
Replace
What's New Added advanced gap handling parameters: gap_policy (0 = skip codons with gaps [default], 1 = treat as missing, 2 = majority rule replacement). gap_threshold (float, default = 0.5), active when gap_policy=2. Support for runtime overrides via environment variables GAP_POLICY and GAP_THRESHOLD. Changed default of -n option: Old default = 1 (replace ambiguous nucleotides). New default = 0 (treat ambiguous as gap). Added automatic CPU core detection for faster parallel computation. Updated help message and README to reflect these changes. This update is based on critical observations and suggestions by Yen-Wen (Denny) Wang. We sincerely thank him for his effort in identifying and clarifying the gap handling behavior.
Special thanks to **Denny** for the update and to **Prarthana** for testing and refining the script. The new version now runs smoothly for both AA and NT formats and includes improved handling and visualization.
### 3D Output Script Update This update improves compatibility and usability for both **AA (amino acid)** and **NT (nucleotide)** outputs in the 3D Chimera visualization script. --- ### Summary of Changes - Replaced the old **scaling factor** column with a new **formatList** column (`AA` or `NT`) in the design file for clearer data structure and fewer errors. - Simplified the logic for handling nucleotide outputs (3× rows per codon) to ensure correct aggregation and color mapping. - Unified γ-value color scaling across all processed genes for consistent visualization. - Improved alignment, significance logic, and color assignment behavior. - Updated example defaults: - `onlySig = FALSE` - `ehColor = c(180, 180, 180)` - `midColor = c(240, 240, 240)` - `bins = 510` for smoother gradients. --- ### Why the Update The previous version required a **scaling factor** to distinguish between nucleotide and amino acid outputs, which caused confusion when different MASS-PRF modes were used. This new version replaces the scaling factor with an explicit `formatList` field (`AA` or `NT`), making the workflow simpler, more transparent, and compatible with both output formats. --- ### Acknowledgments Special thanks to **Yen-Wen (Denny) Wang** for the code refactoring and **Prarthana Sanjeeva Reddy** for testing and improving the AA and NT workflows. Their contributions greatly improved the stability and reproducibility of the 3D visualization pipeline. --- ### Original Author **Nic Fisk** Assistant Professor Department of Cell and Molecular Biology College of the Environment and Life Sciences University of Rhode Island
…34-F1-model_v4.pdb
…1-model_v4.pdb.png
This is the old version used by Professor Fisk. I will upload the new one and back up the old one.
Thanks to Prarthana Sanjeeva Reddy for testing
Improved parsing robustness — the script now tolerates minor format changes in MASS-PRF outputs. Standardized per-nucleotide expansion for consistent 3D mapping. Cleaner failure handling and logging (Failed_genes.txt). Normalized output columns and generic labels for both NT and AA contexts. Safer filename handling when gene names contain underscores.
What changed
Added fixed gamma range with two new params:
minGamma = -4
maxGamma = 50
Colors are generated with a split palette (negative → blue, positive → red) centered at 0.
γ values outside [minGamma, maxGamma] are clamped to the edge bins.
When logT is numeric (e.g., 2), signed-log mapping respects the same fixed range.
Conditional installs for BiocManager, msa, and bio3d to avoid redundant installs.
Why
Make cross-gene / cross-run color scales comparable and stable.
Prevent extreme γ values from skewing palettes.
Backwards compatibility
If all γ already fall within the new range, visual results remain qualitatively consistent.
Notes
Ensure Biostrings is installed: BiocManager::install("Biostrings").
This release adds a simple non-genic mode and fixes a rare stability issue. Core results for coding runs are unchanged. Non-genic mode (`-ng 1`) - New per-nucleotide mode for non-coding regions (intergenic / intronic / UTR). - Codon/synonymous logic is turned off; every variable site is treated as replacement (R). - PS/DS, MKT, and NI are skipped (reported as `NA`); PR/DR and Gamma are still reported. - Use `-t <div_time>` to reuse divergence time estimated from coding runs. - Recommended for non-genic runs: `-ng 1 -o 1 -t <dt>`. Stability fix (no change to results) - Fixed a rare crash of the form: `blew fast model num ... 0.999193`. - Internal model sampling now uses a normalized CDF in `[0,1)` (standard inverse-CDF draw). - This does **not** change model probabilities or estimates; it only removes the crash.
upadate .h
…ng 1` non-genic mode). - Added `--non_genic` for MASS-PRF runs with `-ng 1`. → 2D skips PS/DS-based panels and keeps only Gamma + CI plots and site lists. - Added `--scale <k>` to set the scaling factor when it is missing from MASS-PRF output. → If no scaling is found, the script now safely defaults to scale = 1 instead of stopping. - More robust parsing of the results table and “Mission accomplished” marker. → Small format changes in the output are handled better; fewer false failures. - Standardized position expansion. → When scaling is used, positions are expanded to a clean 1..N grid for consistent 3D coloring. - Backward compatible. → If you do not use the new flags, behavior and outputs remain the same as before.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
added under the session 'How to run MASS-PRF?'
'Note: The program requires the four LookupTables in the working directory to run successfully, for more details about the LookupTables, please refer to the session ‘Gamma calculation using four Lookup Tables’ below.'