Update adding treatment metadata, event columns, wgd and pre-computed msi by davidrequena · Pull Request #36 · mskilab-org/skilift

davidrequena · 2026-03-26T19:05:49Z

My changes are the following:

I added a new function (add_wgd) which calculates Whole Genome Doubling using Bielski's paper method (https://pubmed.ncbi.nlm.nih.gov/30013179/) or uses the value provided.
We do not currently have wgd, this is a new functionality.

I modified two functions already present (add_tmb and add_msisensor_score), to accept provided values before attempting calculation. This is necessary for HMF, because we have the MSI but we don't have the inputs to run msisensor.

Two functions I am adding (add_treatment_metadata and add_sv_columns) are pulling metadata from the cohort object and adds it to the metadata.json object, which later will be integrated into the datafiles.json and datafiles.arrow. This allows these columns to be used both in the aggregation plots and in the frontend interface editing the datasets.json.
The advantage over the current state is automating the acquisition instead of having to manipulate the individual .json files in R.

I modified create_metadata to add steps for the functions I described above. I also added the corresponding man files.

shihabdider · 2026-03-26T20:31:53Z

Would you mind reopening the old PR? I'd prefer to keep these changes contained to a single PR thread as it makes it a bit easier for me to follow the thread of discussion. I'm going to close this PR for now. Please reopen the old one (which already has your changes) and add your comment.

kevinmhadi · 2026-03-26T22:02:12Z

    return(metadata)
 }

+#' @name add_sv_columns


We discussed the convention of how the nested events should be represented in metadata.json and datafiles.json based on the datasets schema (see the screenshot you sent me) earlier at my desk. This code won't create the nested entry in the json for complex events.

Have you tried loading in a json via jsonlite::fromJSON in R to see how nested columns would be represented in R before they're written to the json? There are examples for you to work with to be able to engineer this.

To clarify, what gets written from parsing complex events in R data.table object metadata to the metadata.json should be formatted such that it conforms to the nested structure specified in the datasets.json schema in your screenshot above (as we spoke about).

The datasets.json means that it's looking for entries in datafiles.json that look like:

{ "complex_events": { "pyrgo": 0, "del": 0, ... } }

That means nesting the column as a list element in R data.table object metadata. Load in other datafiles.json and see examples of similar fields in R to see how those are constructed in a data.table/data.frame structure via jsonlite::fromJSON()

Please re-implement this such that this will conform to the convention in the schema in datasets.json.

In the new commit of the PR, I am removing this function. I think I can pull this info from sv_type_counts in the datasets.json directly. I deleted all references to the function.

kevinmhadi · 2026-03-26T22:05:41Z

-    lstix = seq_len(NROW(added_field_values))
-    for (ii in lstix) {
-        field = added_field_values[ii]
-        fnm = names(field)
-        value = field[[1]]
-        metadata[[fnm]] = value
-    }
-


What is the reason for moving this block of code to the beginning?

This allows me to add the added fields to metadata and check lines below if the input was already provided, so the function that calculates the corresponding column doesn't need to run (or try to run). It can be checked better in the new commit of the PR.

kevinmhadi · 2026-03-26T22:17:30Z

+#' @name add_treatment_metadata
+#' @title Add treatment metadata
+#' @description
+#' Adds treatment metadata information such as age at biopsy, treatment lines, number of different treatment lines, and information about the treatment line with the best response (name, response, mechanism, PFS duration).
+#'
+#' @param metadata A data.table containing metadata.
+#' @param input_age_at_biopsy Age at the biopsy.
+#' @param input_treatment_lines Different treatment lines received.
+#' @param input_n_treatment_lines Number of treatment lines received.
+#' @param input_best_treatment Name of the treatment line with the best response.
+#' @param input_best_treatment_response Best response obtained among the treatment lines.
+#' @param input_best_treatment_mechanism Mechanism of the treatment line with the best response.
+#' @param input_best_treatment_PFS_duration Progression-free survival of the treatment line with the best response.
+#' @return Updated metadata with treatment information added.
+add_treatment_metadata <- function(
+    metadata,
+    input_age_at_biopsy = NULL,
+    input_treatment_lines = NULL,
+    input_n_treatment_lines = NULL,
+    input_best_treatment = NULL,
+    input_best_treatment_response = NULL,
+    input_best_treatment_mechanism = NULL,
+    input_best_treatment_PFS_duration = NULL
+) {
+
+    # Validate and use age_at_biopsy if provided
+    if (!is.null(input_age_at_biopsy)) {
+        if (!is.numeric(input_age_at_biopsy)) {
+            warning("age_at_biopsy must be a number, ignored")
+        } else {
+            metadata[, age_at_biopsy := input_age_at_biopsy]
+        }
+    }
+
+    # Validate and use treatment_lines if provided
+    if (!is.null(input_treatment_lines)) {
+        if (!is.character(input_treatment_lines)) {
+            warning("input_treatment_lines must be a character, ignored")
+        } else {
+            metadata[, treatment_lines := input_treatment_lines]
+        }
+    }
+
+    # Validate and use input_n_treatment_lines if provided
+    if (!is.null(input_n_treatment_lines)) {
+        if (!is.numeric(input_n_treatment_lines)) {
+            warning("input_n_treatment_lines must be a number, ignored")
+        } else {
+            metadata[, n_treatment_lines := input_n_treatment_lines]
+        }
+    }
+
+    # Validate and use input_best_treatment if provided
+    if (!is.null(input_best_treatment)) {
+        if (!is.character(input_best_treatment)) {
+            warning("input_best_treatment must be a character, ignored")
+        } else {
+            metadata[, best_treatment := input_best_treatment]
+        }
+    }
+
+    # Validate and use input_best_treatment_response if provided
+    if (!is.null(input_best_treatment_response)) {
+        if (!is.character(input_best_treatment_response)) {
+            warning("input_best_treatment_response must be a character, ignored")
+        } else {
+            metadata[, best_treatment_response := input_best_treatment_response]
+        }
+    }
+
+    # Validate and use input_best_treatment_mechanism if provided
+    if (!is.null(input_best_treatment_mechanism)) {
+        if (!is.character(input_best_treatment_mechanism)) {
+            warning("input_best_treatment_mechanism must be a character, ignored")
+        } else {
+            metadata[, best_treatment_mechanism := input_best_treatment_mechanism]
+        }
+    }
+
+    # Validate and use input_best_treatment_PFS_duration if provided
+    if (!is.null(input_best_treatment_PFS_duration)) {
+        if (!is.numeric(input_best_treatment_PFS_duration)) {
+            warning("input_best_treatment_PFS_duration must be a number, ignored")
+        } else {
+            metadata[, best_treatment_PFS_duration := input_best_treatment_PFS_duration]
+        }
+    }
+  return(metadata)
+}
+


This treatment metadata is specific to HMF, but not other datasets. Skilift shouldn't have to support every single variable that is specific enough only to be useful for one data set. The right place for this would be in code documenting a specific analysis (like jupyter, r markdown, emacs blog files, etc).

Please remove this block from the PR.

Done in the new update of the PR

…onditional to msi, tmb, and wgd to detect if the value was already provided before running

davidrequena · 2026-03-27T04:23:58Z

@shihabdider in this second commit I am not passing the provided values as parameters and I added conditionals (lines 1844-1872) to prevent the corresponding function to execute if the value is already provided in the cohort.

Adding treatment metadata, event columns, wgd and pre-computed msi

2044f75

davidrequena requested review from kevinmhadi and shihabdider March 26, 2026 19:05

shihabdider closed this Mar 26, 2026

shihabdider reopened this Mar 26, 2026

kevinmhadi reviewed Mar 26, 2026

View reviewed changes

Removing functions add_treatment metadata and_sv_columns and adding c…

fdcd5c5

…onditional to msi, tmb, and wgd to detect if the value was already provided before running

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update adding treatment metadata, event columns, wgd and pre-computed msi#36

Update adding treatment metadata, event columns, wgd and pre-computed msi#36
davidrequena wants to merge 2 commits intomainfrom
dr_dev

davidrequena commented Mar 26, 2026

Uh oh!

shihabdider commented Mar 26, 2026

Uh oh!

kevinmhadi Mar 26, 2026

Uh oh!

kevinmhadi Mar 26, 2026

Uh oh!

kevinmhadi Mar 26, 2026

Uh oh!

kevinmhadi Mar 26, 2026

Uh oh!

davidrequena Mar 27, 2026 •

edited

Loading

Uh oh!

kevinmhadi Mar 26, 2026 •

edited

Loading

Uh oh!

davidrequena Mar 27, 2026

Uh oh!

kevinmhadi Mar 26, 2026

Uh oh!

kevinmhadi Mar 26, 2026

Uh oh!

davidrequena Mar 27, 2026

Uh oh!

davidrequena commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

davidrequena commented Mar 26, 2026

Uh oh!

shihabdider commented Mar 26, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidrequena Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevinmhadi Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidrequena commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

davidrequena Mar 27, 2026 •

edited

Loading

kevinmhadi Mar 26, 2026 •

edited

Loading