Skip to content

Unable to find simulated DMRs #4

@mkpython3

Description

@mkpython3

Hi, I wanted to use MethylScore in my analysis for finding DMRs between ~20 different Genotypes. Therefore I first wanted to test if MethylScore is able to find simulated DMRs. For this I used metilenes background simulation script to simulate 20 samples of A. thaliana chromosome 1 drawing methylation levels from a beta distribution with parameters a=10, b=1 and then implementing DMRs with a custom script changing the methylation levels for 5 consecutive CG sites to values drawn from a beta distribution with inverted parameters a=1, b=10 in 50% of the samples. With this method I implemented 100 DMRs in the CG context. This is of course a very simple and non perfect way of simulating DMRs but this should suffice for the intend of just getting the pipeline up and running.

However I was not able to find any DMRs with MethylScore. I have tested to call DMRs with HOME alternatively, which found almost all of them with correct boundaries.

Interestingly, if I simulate the samples with a low methylation background (using beta distribution parameters a=1, b=10) and DMRs with a high level of methylation in 50% of the samples (beta distribution parameters a=10, b=1) the last step of the pipeline (DMRS:MERGE_DMRS) is skipped without visible error:

[1e/74d241] process > CONSENSUS:SPLIT_BEDGRAPH (S19:1)         [100%] 20 of 20 ✔
[36/004412] process > CONSENSUS:MATRIX:BUILD (1)               [100%] 1 of 1 ✔
[cc/1f5a2d] process > SAMPLESHEET:GENERATE (genome_matrix.tsv) [100%] 1 of 1 ✔
[85/f57be8] process > MRS:CALL_MRS (S12:genome_matrix.tsv)     [100%] 20 of 20 ✔
[17/6383bc] process > MRS:MR_STATISTICS (S12)                  [100%] 20 of 20 ✔
[2d/3f172c] process > MRS:SPLIT_MRS (all:batchsize:500)        [100%] 1 of 1 ✔
[3d/b2d032] process > DMRS:INDEX (genome_matrix.tsv)           [100%] 1 of 1 ✔
[d5/8a609b] process > DMRS:CALL_DMRS (CG:all.MRbatch.0)        [100%] 2 of 2 ✔
[-        ] process > DMRS:MERGE_DMRS                          -
WARN: Graphviz is required to render the execution DAG in the given format -- See http://www.graphviz.org for more info.
Completed at: 27-Apr-2023 11:05:26
Duration    : 2m 43s
CPU hours   : 0.4
Succeeded   : 66

I am using the following command to run MethylScore:
nextflow run Computomics/MethylScore --BEDGRAPH --SAMPLE_SHEET=/scratch/mariusk/methylscore_test/samplesheet.tsv --GENOME=/scratch/mariusk/methylscore_test/Arabidopsis_thaliana.TAIR10.dna.chromosome.1.fa -profile docker --DMR_CONTEXTS CG

Im sadly unable to attach the input files directly to this issue due to size limitations, however I have uploaded the first case of a high methylation background here: https://drive.google.com/file/d/1_LKNfjrVI5ylzXdBqGid6QrVquSxY_WT/view?usp=sharing
Introduced DMRs in this dataset are at the following positions:

393835	393969
418517	418788
2003984	2004147
2109083	2109208
2569172	2569379
2967632	2967668
3315692	3315884
3503494	3503627
3583380	3583597
4078186	4078481
4227003	4227188
4334846	4334917
5743592	5743667
6240901	6240984
6548515	6548647
6641201	6641376
7471468	7471750
7547517	7548102
7760154	7760226
8035196	8035398
8402500	8402530
8685467	8685561
8790599	8790671
9819252	9819402
9852252	9852383
9916057	9916196
10070847	10070893
10219783	10219954
10224750	10224846
10422856	10423016
10462585	10462610
10526601	10526668
10814851	10814964
10899133	10899173
11167080	11167416
11194751	11194825
11596910	11597130
11836754	11836926
12313861	12313946
12818518	12818602
12984426	12984699
13250013	13250216
13528312	13528351
13947334	13947477
14868353	14868410
15638549	15638724
16329536	16329571
16833679	16833827
16866955	16867104
16903114	16903307
16924986	16925045
17052473	17052576
17256103	17256127
18976429	18976550
19014836	19014913
19088534	19088658
19286339	19287092
19354498	19354582
19475855	19475923
19505494	19505638
20099991	20100058
20186034	20186241
20926924	20927016
21167056	21167244
21180101	21180168
21311272	21311441
21531193	21531285
22172847	22172894
22180002	22180550
22408689	22409039
22938037	22938100
22947486	22947735
23668779	23668984
24130230	24130331
24189255	24189288
24742545	24742621
24934153	24934265
25545726	25545889
25669868	25669925
25673210	25673405
25881091	25881140
26453262	26453534
26739762	26739816
26805084	26805459
27574493	27574732
28073147	28073481
28090913	28091170
28308716	28308795
28351441	28351691
28691911	28692227
28706160	28706295
28958464	28958674
29159617	29160088
29245316	29245354
29458094	29458153
30018157	30018178
30157251	30157318
30246555	30246732
30348834	30348880
30382826	30383265

Here is a visual example of a simulated DMR in the uploaded data:
Screenshot from 2023-04-27 15-57-25
The X axis represents the genomic coordinates, the methylation levels of the 20 samples is on the Y axis. One DMR is shown with a red background as an example, the other DMRs are out of the X axis boundaries in this exerpt.

I have given each sample its own ID in the samplesheet.tsv. The .nextflow.log files for the run of the high methylation level background where no DMRs were found and the run with the low background where the last step is skipped are attached to this issue.

Am I missing something obvious? I have tried playing with the MR and DMR parameters a bit but had no luck. If you need anything else from me please just let me know. I would be very grateful if you could help me out. Thanks in advance.

nextflow_high_bg.log
nextflow_low_bg.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions