-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hi, I wanted to use MethylScore in my analysis for finding DMRs between ~20 different Genotypes. Therefore I first wanted to test if MethylScore is able to find simulated DMRs. For this I used metilenes background simulation script to simulate 20 samples of A. thaliana chromosome 1 drawing methylation levels from a beta distribution with parameters a=10, b=1 and then implementing DMRs with a custom script changing the methylation levels for 5 consecutive CG sites to values drawn from a beta distribution with inverted parameters a=1, b=10 in 50% of the samples. With this method I implemented 100 DMRs in the CG context. This is of course a very simple and non perfect way of simulating DMRs but this should suffice for the intend of just getting the pipeline up and running.
However I was not able to find any DMRs with MethylScore. I have tested to call DMRs with HOME alternatively, which found almost all of them with correct boundaries.
Interestingly, if I simulate the samples with a low methylation background (using beta distribution parameters a=1, b=10) and DMRs with a high level of methylation in 50% of the samples (beta distribution parameters a=10, b=1) the last step of the pipeline (DMRS:MERGE_DMRS) is skipped without visible error:
[1e/74d241] process > CONSENSUS:SPLIT_BEDGRAPH (S19:1) [100%] 20 of 20 ✔
[36/004412] process > CONSENSUS:MATRIX:BUILD (1) [100%] 1 of 1 ✔
[cc/1f5a2d] process > SAMPLESHEET:GENERATE (genome_matrix.tsv) [100%] 1 of 1 ✔
[85/f57be8] process > MRS:CALL_MRS (S12:genome_matrix.tsv) [100%] 20 of 20 ✔
[17/6383bc] process > MRS:MR_STATISTICS (S12) [100%] 20 of 20 ✔
[2d/3f172c] process > MRS:SPLIT_MRS (all:batchsize:500) [100%] 1 of 1 ✔
[3d/b2d032] process > DMRS:INDEX (genome_matrix.tsv) [100%] 1 of 1 ✔
[d5/8a609b] process > DMRS:CALL_DMRS (CG:all.MRbatch.0) [100%] 2 of 2 ✔
[- ] process > DMRS:MERGE_DMRS -
WARN: Graphviz is required to render the execution DAG in the given format -- See http://www.graphviz.org for more info.
Completed at: 27-Apr-2023 11:05:26
Duration : 2m 43s
CPU hours : 0.4
Succeeded : 66
I am using the following command to run MethylScore:
nextflow run Computomics/MethylScore --BEDGRAPH --SAMPLE_SHEET=/scratch/mariusk/methylscore_test/samplesheet.tsv --GENOME=/scratch/mariusk/methylscore_test/Arabidopsis_thaliana.TAIR10.dna.chromosome.1.fa -profile docker --DMR_CONTEXTS CG
Im sadly unable to attach the input files directly to this issue due to size limitations, however I have uploaded the first case of a high methylation background here: https://drive.google.com/file/d/1_LKNfjrVI5ylzXdBqGid6QrVquSxY_WT/view?usp=sharing
Introduced DMRs in this dataset are at the following positions:
393835 393969
418517 418788
2003984 2004147
2109083 2109208
2569172 2569379
2967632 2967668
3315692 3315884
3503494 3503627
3583380 3583597
4078186 4078481
4227003 4227188
4334846 4334917
5743592 5743667
6240901 6240984
6548515 6548647
6641201 6641376
7471468 7471750
7547517 7548102
7760154 7760226
8035196 8035398
8402500 8402530
8685467 8685561
8790599 8790671
9819252 9819402
9852252 9852383
9916057 9916196
10070847 10070893
10219783 10219954
10224750 10224846
10422856 10423016
10462585 10462610
10526601 10526668
10814851 10814964
10899133 10899173
11167080 11167416
11194751 11194825
11596910 11597130
11836754 11836926
12313861 12313946
12818518 12818602
12984426 12984699
13250013 13250216
13528312 13528351
13947334 13947477
14868353 14868410
15638549 15638724
16329536 16329571
16833679 16833827
16866955 16867104
16903114 16903307
16924986 16925045
17052473 17052576
17256103 17256127
18976429 18976550
19014836 19014913
19088534 19088658
19286339 19287092
19354498 19354582
19475855 19475923
19505494 19505638
20099991 20100058
20186034 20186241
20926924 20927016
21167056 21167244
21180101 21180168
21311272 21311441
21531193 21531285
22172847 22172894
22180002 22180550
22408689 22409039
22938037 22938100
22947486 22947735
23668779 23668984
24130230 24130331
24189255 24189288
24742545 24742621
24934153 24934265
25545726 25545889
25669868 25669925
25673210 25673405
25881091 25881140
26453262 26453534
26739762 26739816
26805084 26805459
27574493 27574732
28073147 28073481
28090913 28091170
28308716 28308795
28351441 28351691
28691911 28692227
28706160 28706295
28958464 28958674
29159617 29160088
29245316 29245354
29458094 29458153
30018157 30018178
30157251 30157318
30246555 30246732
30348834 30348880
30382826 30383265
Here is a visual example of a simulated DMR in the uploaded data:

The X axis represents the genomic coordinates, the methylation levels of the 20 samples is on the Y axis. One DMR is shown with a red background as an example, the other DMRs are out of the X axis boundaries in this exerpt.
I have given each sample its own ID in the samplesheet.tsv. The .nextflow.log files for the run of the high methylation level background where no DMRs were found and the run with the low background where the last step is skipped are attached to this issue.
Am I missing something obvious? I have tried playing with the MR and DMR parameters a bit but had no luck. If you need anything else from me please just let me know. I would be very grateful if you could help me out. Thanks in advance.