| Contents | Logs |
|---|---|
| Date | 2022-12-04 |
| Version | v-0.2 |
Pairwise methylation assocation test (PMAT) is a computational tool tailored for identifying DMRs between unordered pairs like twins. In this tool, absolute methylation difference between unorderd pairs was maximized in DMRs identification. A folded normal (FN) test based on the framework of likelihood ratio test was proposed recently and implemented to test the methylation difference between unordered pairs in each methylation region. In order to improve the approximation precision of a FN test when sample size is not large enough, we further established PMAT with Bartlett correction (PMAT-C). In PMAT-C, a folded normal test with Bartlett correction (FN-C) was implemented to test the methylation differences between unordered pairs.
In PMAT with FN test, the asymptotic distribution of a likelihood ratio statistic under the null hypothesis is a mixture of chi-squared distribution. That is
In PMAT-C with FN-C test, the mixture proportion of asymptotic distribution with an empirical proportion
Specially, PMAT-C with
NOTE: Currently, all the parameters are hard coded in the ./proc/src/mathematics.cpp file (the leading 4 entries in the samParam array):
double foldednomalpvalue(double *a, int m, double *b, int n, bool logit=true, double logittuning=0.00001, bool nonZero = true){
bool useBartlette = true;
// config.txt
double samParam[6]={0.60105772, 0., -4.0224, 0.89, 1., 1.};
...
}Users could change the parameter and re-compile to get the corresponding PMAT-C. Similarly, users could get PMAT under the parameter setting:
double foldednomalpvalue(double *a, int m, double *b, int n, bool logit=true, double logittuning=0.00001, bool nonZero = true){
bool useBartlette = true;
// config.txt
double samParam[6]={0.5, 0., 0, 0, 1., 1.};
...
}C/C++ version could be found in ./proc/ folder.
- Abilities
pmatutilizes a totally different loss-function to tackle the non-order problem intwindata, which fails to distinguishEffect 1 - Effect 2versusEffect 2 - Effect 1;pmatprovides several optimization routines with different initial value schemes to solve the corresponding MLE to balance the computation accuracy, speed and scalbility (EM / BFGS);pmatdetects differentially methylated regions between unordered pairs.
- Transferability and Extendibility
pmatshares the same options as metilene .- In the future,
pmatplans to provide options to call the original metilene.
- Soft Engineering on metilene
- Bug fix
- Port
metilenefrom ANSI-C to Modern C++11 (partly)
A modern C++ compiler with a GNU library is needed:
C++: C++11
Libraries: GSL-2.x
-
To enable the parallel computing ability (
-toption),pthreadsupport is required. -
The
C++11featured compiler could be found here. -
Install
dev/develversion of GSL. For example:
# debian/ubuntu
apt install libgsl-dev
# Centos/RH
yum install libgsl-develNote: for Windows user, one should intall Rtools42 and run
pmatin theC:\rtools42\ucrt64.exeenvironment. In theucrt64bash, run
# to where you download pmat
cd pmat
cd ./release/Win64
tar vxf PMAT-0.1-win64-rtools42.tar.gzNote: For MAC OS user,
Home brewis recommended to install GSL withbrew install gsland properly tune the thread numbers. Otherwise, one should compile the GSL from the source and runpmatwith-t 1option.
Currently, users could use the development version, i.e. compile pmat from source after the pre-install procedure:
git clone https://github.com/yfyang86/pmat
cd ./pmat/proc/
makeNOTE: Tested Platform:
- X86/64(Linux/MAC OSX/Windows);
- ARM(MAC M1-ARM64/Linux-ARMv7/ARMv8);
- MIPS/LoongArch(龙芯3A5000, UOS).
NOTE: Binary release for Mac/Windows/Linux(Ubuntu 18.04) users is also available. But due to the library/platform dependencies/limitations, some function may not work as designed (eg, parallel computing on Mac OSX). Please contact the maintainer or raise an issue if you encounter such problems. Currently, only V0.0.1 is available.
Modify the input file path ../examples/test.dat in test proc of the Makefile to match your settings:
test1:
./pmat -t 32 -a A -b B -X 8 -Y 8 -m 5 -d 0.05 ../examples/test.dat > test.mr.DMR
And run:
make testThe parameters are the same as metilene. For example, -t 32 means 32 threads are used; -a A represents to one group named A; -b B represents to the other group named B; -X 8 means more than 8 non-missing values are required to estimate missing values in group A; -Y 8 means more than 8 non-missing values are required to estimate missing values in group B; -m 5 means more than 5 CpGs are required in a DMR.
A typical example is:
INPUT_FILE="your_input_file_path"
OUTPUT_FILE="output_file_path"
./pmat -t 32 -a A -b B -X 8 -Y 8 -m 5 -d 0.05 "$INPUT_FILE" > "$OUTPUT_FILE"Here INPUT_FILE are the input file. The format should be a TAB separated file:
| Columns | Type | Comment |
|---|---|---|
| chr | character | Example: chr1 |
| pos | integer | Example: 10497 |
| A | double | Example: 0.346938775510204 |
| B | double | Example: 0.369565217391304 |
Here A and B appear in pair. If there are 10 groups, then there should be 10 (A and B) pairs. The file should be (first three lines):
chr pos A B A B A B A B A B A B A B A B A B A B
chr1 10497 0.346938775510204 0.369565217391304 0.362162162162162 0.322727272727273 0.352941176470588 0.486486486486487 0.430769230769231 0.440860215053763 0.439024390243902 0.410526315789474 0.343137254901961 0.40625 0.25974025974026 0.256756756756757 0.296 0.4375 0.377049180327869 0.421487603305785 0.450549450549451 0.385542168674699
chr1 10525 0.92 0.9375 0.940540540540541 0.969298245614035 1 0.972972972972973 0.984615384615385 0.956989247311828 0.963414634146341 0.958333333333333 1 0.958333333333333 0.987012987012987 0.945945945945946 0.944 0.9375 0.975409836065574 0.933884297520661 0.978021978021978 0.975903614457831
chr1 10542 0.96 0.90625 0.978378378378378 0.969298245614035 0.882352941176471 0.945945945945946 0.96875 0.978494623655914 0.963414634146341 0.96875 0.941176470588235 0.957894736842105 0.922077922077922 0.959459459459459 0.96 0.958762886597938 0.959016393442623 0.983471074380165 0.934065934065934 0.951807228915663
Run the test code in linux
./pmat -t 32 -a A -b B -X 8 -Y 8 -m 5 -d 0.05 ../examples/test.dat > test.mr.DMROr use the binary release. The top 10 lines in the output file are:
chr start stop abs.methyl.diff CpGs pFN 5mC.A 5mC.B FDR
chr1 661864 661928 0.069100 5 0.00086492 0.75823 0.79319 0.0064251
chr1 12505 12672 0.118271 5 0.020244 0.77831 0.74389 0.095701
chr1 545153 545215 0.387678 5 0.030127 0.55379 0.74467 0.13055
chr1 713375 713449 0.083285 5 0.46782 0.68649 0.72001 0.55623
chr1 437750 437819 0.463317 5 0.00029354 0.67471 0.53032 0.002544
chr1 546178 546223 0.134798 5 0.005505 0.86891 0.92242 0.031806
chr1 546224 546296 0.067007 5 3.1945e-08 0.98491 0.9615 1.6611e-06
chr1 715040 715121 0.211462 5 0.46519 0.56337 0.69286 0.55623
chr1 662608 662692 0.113878 5 0.28328 0.70271 0.73315 0.55623
The following table explains the columns in output file (with column Names).
| Column | Name | Comment | Example |
|---|---|---|---|
| 1 | chr | chromosome | Example: chr1 |
| 2 | start | start | Example: 661864 |
| 3 | stop | end | Example: 661928 |
| 4 | abs.methyl.diff | avarge absolute 5mC difference between pairs | Example: 0.066676 |
| 5 | CpGs | #CpGs in a region | Example: 5 |
| 6 | pFN | p-value via FN-C test or FN test | Example: 0.0021516 |
| 7 | 5mC.A | average 5mC in one group | Example: 0.75823 |
| 8 | 5mC.B | average 5mC in the other group | Example: 0.79141 |
| 9 | FDR | ajusted p-value (FDR) by B-H method | Example: 0.55623 |
Users could extend/verify the last column (FDR) by the follorwing R script. We should point out method = "BH" and method = "fdr" calculate the same statistic in R p.adjust().
data <- read.table("test.mr.DMR", header = T)
FDR2 <- p.adjust(data$pFN, method = "BH")
#### Verification ####
# plot(data$FDR, FDR2, xlab = 'PMAT', ylab = 'R-p.adjust',
# xlim = c(0,1), ylim = c(0,1))
var(data$FDR / FDR2)
#> [1] 1.70628e-10The result illustrates that PMAT and R::p.adjust derive the same fdr.
NOTE: Other p-value adjustments are possible in R, but they are not within the scope of PMAT. We provide a simple R script caladj.R for users with this intrest.
Rscript caladj.R DMR_OUTPUT_of_PMAT [METHOD]Here [METHOD] is optional, it could be one of holm, hochberg, hommel, bonferroni, BH, BY, fdr, and none. By default, we use fdr (BH). Please check the manual of p.adjust in R for more details. For example Rscript caladj.R test.mr.DMR fdr will generate a test.mr.DMR.fdr file, with FDR value as the last column.
C/C++ version (in the ./proc folder):
mm2.h,mm2.cpp,utils.h,utils.cpp: Apache-2.0- Others: GPL v2.0
- 2022-11-22: Initial the pmat projects
- 2022-11: v-0.1 binary-preview-release
- Linux (X86-64)
- Mac OSX (ARM/Intel)
- Windows
- 2022-11: Example/Simulation
- Simple demo
- Full example (in plan)
- Config file support for the Bartlett correction : In the comming Version
- Documentation and Examples.
-
Config.txtsupport without re-compile.
- Adjusted p-value support : In the comming Version