-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Greetings,
I am analysing wgbs data from a total of 10 samples divided per 5 groups (n=2 per group). I calculated dmrs with the standart t.statistics for all pairwise comparisons.
For visualization in a heatmap, I generated a script (below) in which I generate a matrix with the mean methylation values of all CpGs within a dmr per sample. However, I noticed that the mean methylation values generated in my script do not match with the group.means in the t.stat output. In some cases, I see opposite behavior, a DMR that appears to be hypermethylated in one specific group in the heatmap, is actually hypomethylated in the same comparison in the t.stat results. Furthermore, I noticed that this behavior occurs often in DMRs with high intra-group variability.
Can you weight on this? Is this possible to happen with the t.stat method or am I missing some mistake in my workflow?
In-house script:
Make genomic ranges for bsseq object
ranges <- as.data.frame(bsseq.fit@rowRanges)
ranges$seqnames <- gsub("\.[0-9]+$", "", ranges$seqnames)
row.names(ranges) <- paste0(ranges$seqnames, "", ranges$start, "", ranges$end)
cpg_gr <- makeGRangesFromDataFrame(ranges,
keep.extra.columns = FALSE,
ignore.strand = TRUE,
seqnames.field = "seqnames",
start.field = "start",
end.field = "end",
strand.field = "strand",
starts.in.df.are.0based = FALSE)
Extract methylation matrix
Meth <- getMeth(bsseq.fit, type = "smooth") %>% as.data.frame()
colnames(Meth) <- bsseq.fit@colData$Sample
row.names(Meth) <- rownames(ranges)
Create unique DMR names
dmr_results$Name <- paste0(dmr_results$chr, "", dmr_results$start, "", dmr_results$end)
dmr_dict <- split(dmr_results, dmr_results$Name)
dmr_list <- names(dmr_dict)
Precompute overlaps
cpg_ids <- rownames(ranges)
cpg_gr$ID <- cpg_ids
dmr_gr <- makeGRangesFromDataFrame(dmr_results,
seqnames.field = "chr",
start.field = "start",
end.field = "end",
keep.extra.columns = TRUE,
ignore.strand = TRUE)
dmr_gr$Name <- dmr_results$Name
overlaps <- findOverlaps(dmr_gr, cpg_gr, ignore.strand = TRUE)
overlap_list <- split(subjectHits(overlaps), queryHits(overlaps))
Function using precomputed overlaps
calculate_dmr_means <- function(dmr_idx, dmr_list, Meth, overlap_list, cpg_ids) {
ol <- overlap_list[[as.character(dmr_idx)]]
if (!is.null(ol) && length(ol) > 0) {
cpg_subset <- cpg_ids[ol]
Bvalues_subset <- Meth[cpg_subset, , drop = FALSE]
mean_beta <- colMeans(Bvalues_subset, na.rm = TRUE)
} else {
mean_beta <- rep(NA, ncol(Meth))
}
dmr_name <- dmr_list[dmr_idx]
dmr_means <- as.data.frame(t(mean_beta))
rownames(dmr_means) <- dmr_name
colnames(dmr_means) <- colnames(Meth)
return(dmr_means)
}
Setup parallelism and run
message("Starting DMR Beta calculation...")
plan(multisession)
DMR_beta_list <- future_lapply(seq_along(dmr_list), calculate_dmr_means,
dmr_list = dmr_list,
Meth = Meth,
overlap_list = overlap_list,
cpg_ids = cpg_ids)
Combine and write result
DMR_beta <- do.call(rbind, DMR_beta_list)