AddedDocNoteForLookupTables #2

zimingz · 2021-08-15T01:33:16Z

added under the session 'How to run MASS-PRF?'

'Note: The program requires the four LookupTables in the working directory to run successfully, for more details about the LookupTables, please refer to the session ‘Gamma calculation using four Lookup Tables’ below.'

Merge preprocessing branch w/ master

Modified clustersubseq to perform over 4 different flag seqs, which are keys to scaled syn/rep

scaling

Updates: MASS-PRF was written in the standard C++ programming language. The software package is accompanied by a manual document, example data, source codes, and compiled executable commands for Windows/Linux/Mac. Source codes are released to GPLv3, and can be downloaded from https://github.com/Townsend-Lab-Yale/MASSPRF_10July2016/.

-SCL 0 for no scaling -SCL 1 for automatic scaling -SCL <num> for user defined scaling

The thresholds of gamma thresholds for positive and negative selections.

Updated compiling command without 'make'; put the C++ version and thread details in.

Added the information for MASSPRF pipeline folder, and simulation folder.

Update PRFCluster.cpp

Update PRFCluster.h

Change 1800 to 2700 to adjust to a min_scale_length of 900

…-values of the 2x2 contingency tables.

added under the session 'How to run MASS-PRF?': - "Note: The program requires the four LookupTables in the working directory to run successfully, for more details about the LookupTables, please refer to the session ‘Gamma calculation using four Lookup Tables’ below."

Silent clustering -s 1 is the default option.

-s 1 is the default, the default is clustering silent sites.

Revised README.md to include detailed installation and setup instructions This update improves the README file by integrating comprehensive installation steps, usage guidance, and symlink setup for the MASS-PRF pipeline. It consolidates information from existing documentation to ensure users have a streamlined and clear setup process.

1. Update Time Output Format to Include Leading Zeros Original Code: int h = t / 3600, m = (t % 3600) / 60, s = t % 60; if (h) cout << h << ":" << m << ":" << s << endl; else cout << m << ":" << s << endl; Modified Code: int h = t / 3600, m = (t % 3600) / 60, s = t % 60; if (h) cout << setw(2) << setfill('0') << h << ":" << setw(2) << setfill('0') << m << ":" << setw(2) << setfill('0') << s << endl; else cout << setw(2) << setfill('0') << m << ":" << setw(2) << setfill('0') << s << endl;

These scripts were originally developed by Prof. Nic Fisk, Associate Professor at URI and a friend of the lab, to map selection pressures onto 3D protein structures. They were added to MASSPRF, with his permission, to facilitate 3D visualization of selection mapping.

Replace

What's New Added advanced gap handling parameters: gap_policy (0 = skip codons with gaps [default], 1 = treat as missing, 2 = majority rule replacement). gap_threshold (float, default = 0.5), active when gap_policy=2. Support for runtime overrides via environment variables GAP_POLICY and GAP_THRESHOLD. Changed default of -n option: Old default = 1 (replace ambiguous nucleotides). New default = 0 (treat ambiguous as gap). Added automatic CPU core detection for faster parallel computation. Updated help message and README to reflect these changes. This update is based on critical observations and suggestions by Yen-Wen (Denny) Wang. We sincerely thank him for his effort in identifying and clarifying the gap handling behavior.

Special thanks to **Denny** for the update and to **Prarthana** for testing and refining the script. The new version now runs smoothly for both AA and NT formats and includes improved handling and visualization.

### 3D Output Script Update This update improves compatibility and usability for both **AA (amino acid)** and **NT (nucleotide)** outputs in the 3D Chimera visualization script. --- ### Summary of Changes - Replaced the old **scaling factor** column with a new **formatList** column (`AA` or `NT`) in the design file for clearer data structure and fewer errors. - Simplified the logic for handling nucleotide outputs (3× rows per codon) to ensure correct aggregation and color mapping. - Unified γ-value color scaling across all processed genes for consistent visualization. - Improved alignment, significance logic, and color assignment behavior. - Updated example defaults: - `onlySig = FALSE` - `ehColor = c(180, 180, 180)` - `midColor = c(240, 240, 240)` - `bins = 510` for smoother gradients. --- ### Why the Update The previous version required a **scaling factor** to distinguish between nucleotide and amino acid outputs, which caused confusion when different MASS-PRF modes were used. This new version replaces the scaling factor with an explicit `formatList` field (`AA` or `NT`), making the workflow simpler, more transparent, and compatible with both output formats. --- ### Acknowledgments Special thanks to **Yen-Wen (Denny) Wang** for the code refactoring and **Prarthana Sanjeeva Reddy** for testing and improving the AA and NT workflows. Their contributions greatly improved the stability and reproducibility of the 3D visualization pipeline. --- ### Original Author **Nic Fisk** Assistant Professor Department of Cell and Molecular Biology College of the Environment and Life Sciences University of Rhode Island

…34-F1-model_v4.pdb

…1-model_v4.pdb.png

This is the old version used by Professor Fisk. I will upload the new one and back up the old one.

Thanks to Prarthana Sanjeeva Reddy for testing

Improved parsing robustness — the script now tolerates minor format changes in MASS-PRF outputs. Standardized per-nucleotide expansion for consistent 3D mapping. Cleaner failure handling and logging (Failed_genes.txt). Normalized output columns and generic labels for both NT and AA contexts. Safer filename handling when gene names contain underscores.

What changed Added fixed gamma range with two new params: minGamma = -4 maxGamma = 50 Colors are generated with a split palette (negative → blue, positive → red) centered at 0. γ values outside [minGamma, maxGamma] are clamped to the edge bins. When logT is numeric (e.g., 2), signed-log mapping respects the same fixed range. Conditional installs for BiocManager, msa, and bio3d to avoid redundant installs. Why Make cross-gene / cross-run color scales comparable and stable. Prevent extreme γ values from skewing palettes. Backwards compatibility If all γ already fall within the new range, visual results remain qualitatively consistent. Notes Ensure Biostrings is installed: BiocManager::install("Biostrings").

This release adds a simple non-genic mode and fixes a rare stability issue. Core results for coding runs are unchanged. Non-genic mode (`-ng 1`) - New per-nucleotide mode for non-coding regions (intergenic / intronic / UTR). - Codon/synonymous logic is turned off; every variable site is treated as replacement (R). - PS/DS, MKT, and NI are skipped (reported as `NA`); PR/DR and Gamma are still reported. - Use `-t <div_time>` to reuse divergence time estimated from coding runs. - Recommended for non-genic runs: `-ng 1 -o 1 -t <dt>`. Stability fix (no change to results) - Fixed a rare crash of the form: `blew fast model num ... 0.999193`. - Internal model sampling now uses a normalized CDF in `[0,1)` (standard inverse-CDF draw). - This does **not** change model probabilities or estimates; it only removes the crash.

upadate .h

…ng 1` non-genic mode). - Added `--non_genic` for MASS-PRF runs with `-ng 1`. → 2D skips PS/DS-based panels and keeps only Gamma + CI plots and site lists. - Added `--scale <k>` to set the scaling factor when it is missing from MASS-PRF output. → If no scaling is found, the script now safely defaults to scale = 1 instead of stopping. - More robust parsing of the results table and “Mission accomplished” marker. → Small format changes in the output are handled better; fewer false failures. - Standardized position expansion. → When scaling is used, positions are expanded to a clean 1..N grid for consistent 3D coloring. - Backward compatible. → If you do not use the new flags, behavior and outputs remain the same as before.

stanleyjs and others added 30 commits February 24, 2017 17:01

Merge pull request #1 from zimingz/master

b51ddb2

Merge preprocessing branch w/ master

Added scaleFactor & scaleSeq

f4b1199

Added output for scaling

5c6954e

Modified clustersubseq to perform over 4 different flag seqs, which are keys to scaled syn/rep

Merge pull request #2 from stanleyjs/master

24cfa82

scaling

Update PRFCluster.cpp

ab4f953

Fixed scaling interface

dccda87

-SCL 0 for no scaling -SCL 1 for automatic scaling -SCL <num> for user defined scaling

UpdatedGitHubLink

0ecdfe8

AddedGammaThresholdsSimulations

b6fb022

The thresholds of gamma thresholds for positive and negative selections.

GammaThresholdsUpdates

adca520

CompiledCommandLine_C++Version_thread_issue

0858968

Updated compiling command without 'make'; put the C++ version and thread details in.

UpdatedReference_GithubLink

b865cf2

PipelineForMASSPRF

890c010

InforForPipeline&SimulationFolders

8c2dbb8

Added the information for MASSPRF pipeline folder, and simulation folder.

Update PRFCluster.cpp

29e4604

Update PRFCluster.h

37c4c41

Merge pull request #3 from hrithikguy/patch-1

f2f4057

Update PRFCluster.cpp

Merge pull request #4 from hrithikguy/patch-2

8cdbdfb

Update PRFCluster.h

Update PRFCluster.cpp

2085974

Change 1800 to 2700 to adjust to a min_scale_length of 900

Add files via upload

0d92090

Add McDonald Kreitman test option "-mkt 1", which also computes the p…

0bcdf21

…-values of the 2x2 contingency tables.

PRFcluster_updated_silentClustering_default_option

1955ec9

Silent clustering -s 1 is the default option.

createdReadMe

61b1c50

SlientClusteringDefaultUpdated

33dc17d

-s 1 is the default, the default is clustering silent sites.

Create S-design.tsv

9830bc4

yide0202 added 30 commits September 19, 2025 10:44

Delete PRFCluster.h

70d1358

Replace

Latest 3D Outputs

77c1961

Special thanks to **Denny** for the update and to **Prarthana** for testing and refining the script. The new version now runs smoothly for both AA and NT formats and includes improved handling and visualization.

Update README.md

5900a2e

Update README.md

6c7d931

Rename image_SCH4_AF-P53334-F1-model_v4.pdb.png to image_SCH4_AF-P533…

7a6be76

…34-F1-model_v4.pdb

Rename image_SCH4_AF-P53334-F1-model_v4.pdb to image_SCH4_AF-P53334-F…

6182f0d

…1-model_v4.pdb.png

Update README.md

45e0b8d

Update README.md

efddc69

Update README.md

ba8f52d

Update README.md

2fb9765

Delete 3D_Mapping_Scripts/example_inputs directory

7d7b517

This is the old version used by Professor Fisk. I will upload the new one and back up the old one.

New version of example input

fd5aaa1

Thanks to Prarthana Sanjeeva Reddy for testing

Update README.md

b182564

Update README.md

7833185

Update README.md

8fefb97

Update README.md

86b13ab

Update README.md

5a01ac1

Update README.md

967692b

Update README.md

126e15d

Update README.md

59c72c0

Update README.md

a8bbe2e

MASS-PRF v2.0 release

692f254

upadate .h

Update README.md

658100f

Update README.md

3f1fd86

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AddedDocNoteForLookupTables #2

AddedDocNoteForLookupTables #2

Uh oh!

zimingz commented Aug 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

AddedDocNoteForLookupTables #2

Are you sure you want to change the base?

AddedDocNoteForLookupTables #2

Uh oh!

Conversation

zimingz commented Aug 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants