Skip to content

straggling changes for cosine similarity support and plotting changes to purity/ploidy plot#32

Open
jrafailov wants to merge 26 commits intomainfrom
jr_dev
Open

straggling changes for cosine similarity support and plotting changes to purity/ploidy plot#32
jrafailov wants to merge 26 commits intomainfrom
jr_dev

Conversation

@jrafailov
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for cosine similarity calculations in mutational signature analysis and makes improvements to purity/ploidy plot generation. The changes include adding new COSMIC signature reference files, implementing cosine similarity computation functionality, and updating plotting visualization parameters.

  • Adds cosine similarity computation for comparing reference and attributed signatures
  • Updates purity/ploidy plots with pre-computed color columns and improved visual styling
  • Includes new COSMIC v3.4 SBS and ID signature reference data files

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.

File Description
inst/extdata/data/COSMIC_v3.4_SBS_GRCh37.txt Adds COSMIC v3.4 SBS signature reference data for GRCh37
inst/extdata/data/COSMIC_v3.4_ID_GRCh37.txt Adds COSMIC v3.4 ID signature reference data for GRCh37
R/segment-width-distribution.R Updates plotting functions with configurable binwidth, pre-computed colors, and Cairo dependency check
R/metadata.R Adds cosine similarity computation and integrates it into signature processing workflow

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread R/metadata.R
Comment on lines +916 to +923
compute_cosine_similarity <- function(
metadata,
probabilities,
matrix_file,
is_indel = FALSE,
is_deconstruct_sigs = FALSE,
reference = "hg19"
) {
Copy link

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function parameter documentation lists 'probabilities' and 'matrix_file' as paths to files, but within the function they are treated as already loaded data.tables. Either the documentation should be corrected or the function should handle file paths.

Copilot uses AI. Check for mistakes.
geom_rect(aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = 1- score)) +
scale_fill_scico(palette = "batlow", limits = c(0, 1), breaks = c(0, 0.25, 0.5, 0.75, 1), direction = 1, name = "Relative\nScore", guide = "none") +
geom_rect(aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = color)) +
#scale_fill_scico(palette = "batlow", limits = c(0, 1), breaks = c(0, 0.25, 0.5, 0.75, 1), direction = 1, name = "Relative\nScore", guide = "none") +
Copy link

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out scale_fill_scico lines should be removed instead of being left as comments, as they create code clutter and confusion about the intended functionality.

Copilot uses AI. Check for mistakes.
Comment thread R/metadata.R Outdated
Comment on lines +984 to +994
cosine_similarities <- numeric(ncol(weighted_prob_matrix))
names(cosine_similarities) <- colnames(weighted_prob_matrix)

for (i in 1:ncol(weighted_prob_matrix)) {
vec_a <- weighted_prob_matrix[, i]
vec_b <- weighted_expected[, i]
min_length <- min(length(vec_a), length(vec_b))
vec_a <- vec_a[1:min_length]
vec_b <- vec_b[1:min_length]
cosine_similarity <- sum(vec_a * vec_b) / (sqrt(sum(vec_a^2)) * sqrt(sum(vec_b^2)))
cosine_similarities[i] <- cosine_similarity
Copy link

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cosine similarity calculation assumes matching column names between weighted_prob_matrix and weighted_expected, but uses positional indexing. This could lead to incorrect pairings if the column orders don't match. Consider using column names for safer matching.

Suggested change
cosine_similarities <- numeric(ncol(weighted_prob_matrix))
names(cosine_similarities) <- colnames(weighted_prob_matrix)
for (i in 1:ncol(weighted_prob_matrix)) {
vec_a <- weighted_prob_matrix[, i]
vec_b <- weighted_expected[, i]
min_length <- min(length(vec_a), length(vec_b))
vec_a <- vec_a[1:min_length]
vec_b <- vec_b[1:min_length]
cosine_similarity <- sum(vec_a * vec_b) / (sqrt(sum(vec_a^2)) * sqrt(sum(vec_b^2)))
cosine_similarities[i] <- cosine_similarity
cosine_similarities <- numeric(length(matching_signatures))
names(cosine_similarities) <- matching_signatures
for (sig in matching_signatures) {
if (!(sig %in% colnames(weighted_prob_matrix)) || !(sig %in% colnames(weighted_expected))) {
cosine_similarities[sig] <- NA
next
}
vec_a <- weighted_prob_matrix[, sig]
vec_b <- weighted_expected[, sig]
min_length <- min(length(vec_a), length(vec_b))
vec_a <- vec_a[1:min_length]
vec_b <- vec_b[1:min_length]
cosine_similarity <- sum(vec_a * vec_b) / (sqrt(sum(vec_a^2)) * sqrt(sum(vec_b^2)))
cosine_similarities[sig] <- cosine_similarity

Copilot uses AI. Check for mistakes.
Comment on lines +752 to +755
if (requireNamespace("Cairo", quietly = TRUE)) {
p_html <- create_purity_ploidy_plot(range, bestPloidy, bestPurity, minPurity, maxPurity, minPloidy, maxPloidy, use_geom_rect = FALSE)
save_purple_sunrise_html(p_html, q, out_file_html)
}
Copy link

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cairo namespace check is performed but Cairo is not actually used in the protected code block. The check should either be moved to where Cairo is actually needed or removed if not required.

Suggested change
if (requireNamespace("Cairo", quietly = TRUE)) {
p_html <- create_purity_ploidy_plot(range, bestPloidy, bestPurity, minPurity, maxPurity, minPloidy, maxPloidy, use_geom_rect = FALSE)
save_purple_sunrise_html(p_html, q, out_file_html)
}
p_html <- create_purity_ploidy_plot(range, bestPloidy, bestPurity, minPurity, maxPurity, minPloidy, maxPloidy, use_geom_rect = FALSE)
save_purple_sunrise_html(p_html, q, out_file_html)

Copilot uses AI. Check for mistakes.
@jrafailov jrafailov requested a review from Copilot September 20, 2025 01:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 26 out of 29 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

R/lift-wrappers.R:1

  • Missing closing brace for the if (save_html) condition that starts around line 762. The Cairo namespace check should be nested within the save_html condition.
#' lift wrapper helper function

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread R/utils.R
shutup = function(top_level_expr, capture_output_type = "message") {
nr = NROW(capture_output_type)
do_capture_output = identical("output", capture_output_type)
do_capture_message =identical("message", capture_output_type)
Copy link

Copilot AI Sep 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after = operator. Should be do_capture_message = identical(...)

Suggested change
do_capture_message =identical("message", capture_output_type)
do_capture_message = identical("message", capture_output_type)

Copilot uses AI. Check for mistakes.
Comment thread R/utils.R
is_len_one = NROW(object) == 1
is_not_valid = is_character && ! NROW(object) == 1
is_na = is_len_one && Skilift::is_loosely_na(object, other_nas = base::nullfile())
is_na = is_len_one && all(Skilift::is_loosely_na(object, other_nas = base::nullfile()))
Copy link

Copilot AI Sep 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The all() function wrapping is_loosely_na() may cause issues if is_loosely_na() returns a scalar boolean. This could break the logic for single-element character vectors.

Suggested change
is_na = is_len_one && all(Skilift::is_loosely_na(object, other_nas = base::nullfile()))
is_na = is_len_one && Skilift::is_loosely_na(object, other_nas = base::nullfile())

Copilot uses AI. Check for mistakes.
Comment on lines +784 to +785
geom_rect(aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = color)) +
#scale_fill_scico(palette = "batlow", limits = c(0, 1), breaks = c(0, 0.25, 0.5, 0.75, 1), direction = 1, name = "Relative\nScore", guide = "none") +
Copy link

Copilot AI Sep 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out code should be removed rather than left in the codebase. If this functionality needs to be preserved, consider using a conditional parameter or removing entirely.

Copilot uses AI. Check for mistakes.
geom_raster(aes(x = ploidy, y = purity, fill = 1- score)) +
scale_fill_scico(palette = "batlow", limits = c(0, 1), breaks = c(0, 0.25, 0.5, 0.75, 1), direction = 1, name = "Relative\nScore", guide = "none") +
geom_raster(aes(x = ploidy, y = purity, fill = color)) +
#scale_fill_scico(palette = "batlow", limits = c(0, 1), breaks = c(0, 0.25, 0.5, 0.75, 1), direction = 1, name = "Relative\nScore", guide = "none") +
Copy link

Copilot AI Sep 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out code should be removed rather than left in the codebase. If this functionality needs to be preserved, consider using a conditional parameter or removing entirely.

Suggested change
#scale_fill_scico(palette = "batlow", limits = c(0, 1), breaks = c(0, 0.25, 0.5, 0.75, 1), direction = 1, name = "Relative\nScore", guide = "none") +

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants