Skip to content

group1.mean and group2.mean do not fit with mean meth values per DMR (in-script) #147

@RicardoMartins-Ferreira

Description

Greetings,

I am analysing wgbs data from a total of 10 samples divided per 5 groups (n=2 per group). I calculated dmrs with the standart t.statistics for all pairwise comparisons.

For visualization in a heatmap, I generated a script (below) in which I generate a matrix with the mean methylation values of all CpGs within a dmr per sample. However, I noticed that the mean methylation values generated in my script do not match with the group.means in the t.stat output. In some cases, I see opposite behavior, a DMR that appears to be hypermethylated in one specific group in the heatmap, is actually hypomethylated in the same comparison in the t.stat results. Furthermore, I noticed that this behavior occurs often in DMRs with high intra-group variability.

Can you weight on this? Is this possible to happen with the t.stat method or am I missing some mistake in my workflow?

In-house script:

Make genomic ranges for bsseq object

ranges <- as.data.frame(bsseq.fit@rowRanges)
ranges$seqnames <- gsub("\.[0-9]+$", "", ranges$seqnames)
row.names(ranges) <- paste0(ranges$seqnames, "", ranges$start, "", ranges$end)
cpg_gr <- makeGRangesFromDataFrame(ranges,
keep.extra.columns = FALSE,
ignore.strand = TRUE,
seqnames.field = "seqnames",
start.field = "start",
end.field = "end",
strand.field = "strand",
starts.in.df.are.0based = FALSE)

Extract methylation matrix

Meth <- getMeth(bsseq.fit, type = "smooth") %>% as.data.frame()
colnames(Meth) <- bsseq.fit@colData$Sample
row.names(Meth) <- rownames(ranges)

Create unique DMR names

dmr_results$Name <- paste0(dmr_results$chr, "", dmr_results$start, "", dmr_results$end)
dmr_dict <- split(dmr_results, dmr_results$Name)
dmr_list <- names(dmr_dict)

Precompute overlaps

cpg_ids <- rownames(ranges)
cpg_gr$ID <- cpg_ids
dmr_gr <- makeGRangesFromDataFrame(dmr_results,
seqnames.field = "chr",
start.field = "start",
end.field = "end",
keep.extra.columns = TRUE,
ignore.strand = TRUE)

dmr_gr$Name <- dmr_results$Name

overlaps <- findOverlaps(dmr_gr, cpg_gr, ignore.strand = TRUE)
overlap_list <- split(subjectHits(overlaps), queryHits(overlaps))

Function using precomputed overlaps

calculate_dmr_means <- function(dmr_idx, dmr_list, Meth, overlap_list, cpg_ids) {
ol <- overlap_list[[as.character(dmr_idx)]]
if (!is.null(ol) && length(ol) > 0) {
cpg_subset <- cpg_ids[ol]
Bvalues_subset <- Meth[cpg_subset, , drop = FALSE]
mean_beta <- colMeans(Bvalues_subset, na.rm = TRUE)
} else {
mean_beta <- rep(NA, ncol(Meth))
}
dmr_name <- dmr_list[dmr_idx]
dmr_means <- as.data.frame(t(mean_beta))
rownames(dmr_means) <- dmr_name
colnames(dmr_means) <- colnames(Meth)
return(dmr_means)
}

Setup parallelism and run

message("Starting DMR Beta calculation...")
plan(multisession)
DMR_beta_list <- future_lapply(seq_along(dmr_list), calculate_dmr_means,
dmr_list = dmr_list,
Meth = Meth,
overlap_list = overlap_list,
cpg_ids = cpg_ids)

Combine and write result

DMR_beta <- do.call(rbind, DMR_beta_list)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions