straggling changes for cosine similarity support and plotting changes to purity/ploidy plot#32
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR introduces support for cosine similarity calculations in mutational signature analysis and makes improvements to purity/ploidy plot generation. The changes include adding new COSMIC signature reference files, implementing cosine similarity computation functionality, and updating plotting visualization parameters.
- Adds cosine similarity computation for comparing reference and attributed signatures
- Updates purity/ploidy plots with pre-computed color columns and improved visual styling
- Includes new COSMIC v3.4 SBS and ID signature reference data files
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| inst/extdata/data/COSMIC_v3.4_SBS_GRCh37.txt | Adds COSMIC v3.4 SBS signature reference data for GRCh37 |
| inst/extdata/data/COSMIC_v3.4_ID_GRCh37.txt | Adds COSMIC v3.4 ID signature reference data for GRCh37 |
| R/segment-width-distribution.R | Updates plotting functions with configurable binwidth, pre-computed colors, and Cairo dependency check |
| R/metadata.R | Adds cosine similarity computation and integrates it into signature processing workflow |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| compute_cosine_similarity <- function( | ||
| metadata, | ||
| probabilities, | ||
| matrix_file, | ||
| is_indel = FALSE, | ||
| is_deconstruct_sigs = FALSE, | ||
| reference = "hg19" | ||
| ) { |
There was a problem hiding this comment.
The function parameter documentation lists 'probabilities' and 'matrix_file' as paths to files, but within the function they are treated as already loaded data.tables. Either the documentation should be corrected or the function should handle file paths.
| geom_rect(aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = 1- score)) + | ||
| scale_fill_scico(palette = "batlow", limits = c(0, 1), breaks = c(0, 0.25, 0.5, 0.75, 1), direction = 1, name = "Relative\nScore", guide = "none") + | ||
| geom_rect(aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = color)) + | ||
| #scale_fill_scico(palette = "batlow", limits = c(0, 1), breaks = c(0, 0.25, 0.5, 0.75, 1), direction = 1, name = "Relative\nScore", guide = "none") + |
There was a problem hiding this comment.
The commented-out scale_fill_scico lines should be removed instead of being left as comments, as they create code clutter and confusion about the intended functionality.
| cosine_similarities <- numeric(ncol(weighted_prob_matrix)) | ||
| names(cosine_similarities) <- colnames(weighted_prob_matrix) | ||
|
|
||
| for (i in 1:ncol(weighted_prob_matrix)) { | ||
| vec_a <- weighted_prob_matrix[, i] | ||
| vec_b <- weighted_expected[, i] | ||
| min_length <- min(length(vec_a), length(vec_b)) | ||
| vec_a <- vec_a[1:min_length] | ||
| vec_b <- vec_b[1:min_length] | ||
| cosine_similarity <- sum(vec_a * vec_b) / (sqrt(sum(vec_a^2)) * sqrt(sum(vec_b^2))) | ||
| cosine_similarities[i] <- cosine_similarity |
There was a problem hiding this comment.
The cosine similarity calculation assumes matching column names between weighted_prob_matrix and weighted_expected, but uses positional indexing. This could lead to incorrect pairings if the column orders don't match. Consider using column names for safer matching.
| cosine_similarities <- numeric(ncol(weighted_prob_matrix)) | |
| names(cosine_similarities) <- colnames(weighted_prob_matrix) | |
| for (i in 1:ncol(weighted_prob_matrix)) { | |
| vec_a <- weighted_prob_matrix[, i] | |
| vec_b <- weighted_expected[, i] | |
| min_length <- min(length(vec_a), length(vec_b)) | |
| vec_a <- vec_a[1:min_length] | |
| vec_b <- vec_b[1:min_length] | |
| cosine_similarity <- sum(vec_a * vec_b) / (sqrt(sum(vec_a^2)) * sqrt(sum(vec_b^2))) | |
| cosine_similarities[i] <- cosine_similarity | |
| cosine_similarities <- numeric(length(matching_signatures)) | |
| names(cosine_similarities) <- matching_signatures | |
| for (sig in matching_signatures) { | |
| if (!(sig %in% colnames(weighted_prob_matrix)) || !(sig %in% colnames(weighted_expected))) { | |
| cosine_similarities[sig] <- NA | |
| next | |
| } | |
| vec_a <- weighted_prob_matrix[, sig] | |
| vec_b <- weighted_expected[, sig] | |
| min_length <- min(length(vec_a), length(vec_b)) | |
| vec_a <- vec_a[1:min_length] | |
| vec_b <- vec_b[1:min_length] | |
| cosine_similarity <- sum(vec_a * vec_b) / (sqrt(sum(vec_a^2)) * sqrt(sum(vec_b^2))) | |
| cosine_similarities[sig] <- cosine_similarity |
| if (requireNamespace("Cairo", quietly = TRUE)) { | ||
| p_html <- create_purity_ploidy_plot(range, bestPloidy, bestPurity, minPurity, maxPurity, minPloidy, maxPloidy, use_geom_rect = FALSE) | ||
| save_purple_sunrise_html(p_html, q, out_file_html) | ||
| } |
There was a problem hiding this comment.
The Cairo namespace check is performed but Cairo is not actually used in the protected code block. The check should either be moved to where Cairo is actually needed or removed if not required.
| if (requireNamespace("Cairo", quietly = TRUE)) { | |
| p_html <- create_purity_ploidy_plot(range, bestPloidy, bestPurity, minPurity, maxPurity, minPloidy, maxPloidy, use_geom_rect = FALSE) | |
| save_purple_sunrise_html(p_html, q, out_file_html) | |
| } | |
| p_html <- create_purity_ploidy_plot(range, bestPloidy, bestPurity, minPurity, maxPurity, minPloidy, maxPloidy, use_geom_rect = FALSE) | |
| save_purple_sunrise_html(p_html, q, out_file_html) |
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 26 out of 29 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
R/lift-wrappers.R:1
- Missing closing brace for the
if (save_html)condition that starts around line 762. The Cairo namespace check should be nested within the save_html condition.
#' lift wrapper helper function
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| shutup = function(top_level_expr, capture_output_type = "message") { | ||
| nr = NROW(capture_output_type) | ||
| do_capture_output = identical("output", capture_output_type) | ||
| do_capture_message =identical("message", capture_output_type) |
There was a problem hiding this comment.
Missing space after = operator. Should be do_capture_message = identical(...)
| do_capture_message =identical("message", capture_output_type) | |
| do_capture_message = identical("message", capture_output_type) |
| is_len_one = NROW(object) == 1 | ||
| is_not_valid = is_character && ! NROW(object) == 1 | ||
| is_na = is_len_one && Skilift::is_loosely_na(object, other_nas = base::nullfile()) | ||
| is_na = is_len_one && all(Skilift::is_loosely_na(object, other_nas = base::nullfile())) |
There was a problem hiding this comment.
The all() function wrapping is_loosely_na() may cause issues if is_loosely_na() returns a scalar boolean. This could break the logic for single-element character vectors.
| is_na = is_len_one && all(Skilift::is_loosely_na(object, other_nas = base::nullfile())) | |
| is_na = is_len_one && Skilift::is_loosely_na(object, other_nas = base::nullfile()) |
| geom_rect(aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = color)) + | ||
| #scale_fill_scico(palette = "batlow", limits = c(0, 1), breaks = c(0, 0.25, 0.5, 0.75, 1), direction = 1, name = "Relative\nScore", guide = "none") + |
There was a problem hiding this comment.
Commented out code should be removed rather than left in the codebase. If this functionality needs to be preserved, consider using a conditional parameter or removing entirely.
| geom_raster(aes(x = ploidy, y = purity, fill = 1- score)) + | ||
| scale_fill_scico(palette = "batlow", limits = c(0, 1), breaks = c(0, 0.25, 0.5, 0.75, 1), direction = 1, name = "Relative\nScore", guide = "none") + | ||
| geom_raster(aes(x = ploidy, y = purity, fill = color)) + | ||
| #scale_fill_scico(palette = "batlow", limits = c(0, 1), breaks = c(0, 0.25, 0.5, 0.75, 1), direction = 1, name = "Relative\nScore", guide = "none") + |
There was a problem hiding this comment.
Commented out code should be removed rather than left in the codebase. If this functionality needs to be preserved, consider using a conditional parameter or removing entirely.
| #scale_fill_scico(palette = "batlow", limits = c(0, 1), breaks = c(0, 0.25, 0.5, 0.75, 1), direction = 1, name = "Relative\nScore", guide = "none") + |
…s.json automated cohort type parsing
…ubsample regardless of protein coding or not; tumor type db lookup automated..
* Adding var tumor_details * Specific changes for HMF dataset to use denoised_coverage_field
minor cosine similarity functionality
No description provided.