Skip to content

Conversation

@zimingz
Copy link
Owner

@zimingz zimingz commented Aug 15, 2021

added under the session 'How to run MASS-PRF?'

'Note: The program requires the four LookupTables in the working directory to run successfully, for more details about the LookupTables, please refer to the session ‘Gamma calculation using four Lookup Tables’ below.'

stanleyjs and others added 30 commits February 24, 2017 17:01
Merge preprocessing branch w/ master
Modified clustersubseq to perform over 4 different flag seqs, which are keys to scaled syn/rep
Updates: MASS-PRF was written in the standard C++ programming language. The software package is accompanied by a manual document, example data, source codes, and compiled executable commands for Windows/Linux/Mac. Source codes are released to GPLv3, and can be downloaded from https://github.com/Townsend-Lab-Yale/MASSPRF_10July2016/.
-SCL 0 for no scaling
-SCL 1 for automatic scaling
-SCL <num> for user defined scaling
The thresholds of gamma thresholds for positive and negative selections.
Updated compiling command without 'make'; put the C++ version and thread details in.
Added the information for MASSPRF pipeline folder, and simulation folder.
Change 1800 to 2700 to adjust to a min_scale_length of 900
added under the session 'How to run MASS-PRF?':
- "Note: The program requires the four LookupTables in the working directory to run successfully, for more details about the LookupTables, please refer to the session ‘Gamma calculation using four Lookup Tables’ below."
Silent clustering -s 1 is the default option.
-s 1 is the default, the default is clustering silent sites.
Revised README.md to include detailed installation and setup instructions

This update improves the README file by integrating comprehensive installation steps, usage guidance, and symlink setup for the MASS-PRF pipeline. It consolidates information from existing documentation to ensure users have a streamlined and clear setup process.
1. Update Time Output Format to Include Leading Zeros
Original Code:

int h = t / 3600, m = (t % 3600) / 60, s = t % 60;
if (h) cout << h << ":" << m << ":" << s << endl;
else cout << m << ":" << s << endl;
Modified Code:

int h = t / 3600, m = (t % 3600) / 60, s = t % 60;
if (h) 
    cout << setw(2) << setfill('0') << h << ":" 
         << setw(2) << setfill('0') << m << ":" 
         << setw(2) << setfill('0') << s << endl;
else 
    cout << setw(2) << setfill('0') << m << ":" 
         << setw(2) << setfill('0') << s << endl;
These scripts were originally developed by Prof. Nic Fisk, Associate Professor at URI and a friend of the lab, to map selection pressures onto 3D protein structures. They were added to MASSPRF, with his permission, to facilitate 3D visualization of selection mapping.
These scripts were originally developed by Prof. Nic Fisk, Associate Professor at URI and a friend of the lab, to map selection pressures onto 3D protein structures. They were added to MASSPRF, with his permission, to facilitate 3D visualization of selection mapping.
What's New

Added advanced gap handling parameters:

gap_policy (0 = skip codons with gaps [default], 1 = treat as missing, 2 = majority rule replacement).

gap_threshold (float, default = 0.5), active when gap_policy=2.

Support for runtime overrides via environment variables GAP_POLICY and GAP_THRESHOLD.

Changed default of -n option:

Old default = 1 (replace ambiguous nucleotides).

New default = 0 (treat ambiguous as gap).

Added automatic CPU core detection for faster parallel computation.

Updated help message and README to reflect these changes.

This update is based on critical observations and suggestions by Yen-Wen (Denny) Wang.
We sincerely thank him for his effort in identifying and clarifying the gap handling behavior.
Special thanks to **Denny** for the update and to **Prarthana** for testing and refining the script.  
The new version now runs smoothly for both AA and NT formats and includes improved handling and visualization.
### 3D Output Script Update

This update improves compatibility and usability for both **AA (amino acid)** and **NT (nucleotide)** outputs in the 3D Chimera visualization script.

---

### Summary of Changes
- Replaced the old **scaling factor** column with a new **formatList** column (`AA` or `NT`) in the design file for clearer data structure and fewer errors.  
- Simplified the logic for handling nucleotide outputs (3× rows per codon) to ensure correct aggregation and color mapping.  
- Unified γ-value color scaling across all processed genes for consistent visualization.  
- Improved alignment, significance logic, and color assignment behavior.  
- Updated example defaults:
  - `onlySig = FALSE`
  - `ehColor = c(180, 180, 180)`
  - `midColor = c(240, 240, 240)`
  - `bins = 510` for smoother gradients.

---

### Why the Update
The previous version required a **scaling factor** to distinguish between nucleotide and amino acid outputs, which caused confusion when different MASS-PRF modes were used.  
This new version replaces the scaling factor with an explicit `formatList` field (`AA` or `NT`), making the workflow simpler, more transparent, and compatible with both output formats.

---

### Acknowledgments
Special thanks to **Yen-Wen (Denny) Wang** for the code refactoring and  
**Prarthana Sanjeeva Reddy** for testing and improving the AA and NT workflows.  

Their contributions greatly improved the stability and reproducibility of the 3D visualization pipeline.

---

### Original Author
**Nic Fisk**  
Assistant Professor  
Department of Cell and Molecular Biology  
College of the Environment and Life Sciences  
University of Rhode Island
This is the old version used by Professor Fisk. I will upload the new one and back up the old one.
Thanks to Prarthana Sanjeeva Reddy for testing
Improved parsing robustness — the script now tolerates minor format changes in MASS-PRF outputs.

Standardized per-nucleotide expansion for consistent 3D mapping.

Cleaner failure handling and logging (Failed_genes.txt).

Normalized output columns and generic labels for both NT and AA contexts.

Safer filename handling when gene names contain underscores.
What changed

Added fixed gamma range with two new params:

minGamma = -4
maxGamma = 50


Colors are generated with a split palette (negative → blue, positive → red) centered at 0.

γ values outside [minGamma, maxGamma] are clamped to the edge bins.

When logT is numeric (e.g., 2), signed-log mapping respects the same fixed range.

Conditional installs for BiocManager, msa, and bio3d to avoid redundant installs.

Why

Make cross-gene / cross-run color scales comparable and stable.

Prevent extreme γ values from skewing palettes.

Backwards compatibility

If all γ already fall within the new range, visual results remain qualitatively consistent.

Notes

Ensure Biostrings is installed: BiocManager::install("Biostrings").
This release adds a simple non-genic mode and fixes a rare stability issue. Core results for coding runs are unchanged.

Non-genic mode (`-ng 1`)
- New per-nucleotide mode for non-coding regions (intergenic / intronic / UTR).
- Codon/synonymous logic is turned off; every variable site is treated as replacement (R).
- PS/DS, MKT, and NI are skipped (reported as `NA`); PR/DR and Gamma are still reported.
- Use `-t <div_time>` to reuse divergence time estimated from coding runs.
- Recommended for non-genic runs: `-ng 1 -o 1 -t <dt>`.

Stability fix (no change to results)
- Fixed a rare crash of the form: `blew fast model num ... 0.999193`.
- Internal model sampling now uses a normalized CDF in `[0,1)` (standard inverse-CDF draw).
- This does **not** change model probabilities or estimates; it only removes the crash.
…ng 1` non-genic mode).

- Added `--non_genic` for MASS-PRF runs with `-ng 1`.  
  → 2D skips PS/DS-based panels and keeps only Gamma + CI plots and site lists.

- Added `--scale <k>` to set the scaling factor when it is missing from MASS-PRF output.  
  → If no scaling is found, the script now safely defaults to scale = 1 instead of stopping.

- More robust parsing of the results table and “Mission accomplished” marker.  
  → Small format changes in the output are handled better; fewer false failures.

- Standardized position expansion.  
  → When scaling is used, positions are expanded to a clean 1..N grid for consistent 3D coloring.

- Backward compatible.  
  → If you do not use the new flags, behavior and outputs remain the same as before.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants