Conversation
There was a problem hiding this comment.
Pull request overview
Adds a decompiler path to Capr so Atlas/Circe cohort JSON can be converted back into equivalent Capr R code, and introduces regression tests/fixtures to validate JSON ↔ Capr round-trips (including SQL equivalence checks via CirceR when available).
Changes:
- Add
jsonToCapr()/jsonToCaprFile()to decompile Circe cohort JSON into Capr code (concept sets +cohort()construction). - Extend query/attribute support to better preserve Circe semantics (e.g.,
conditionTypeExclude(FALSE), omitCodesetIdwhen concept set is empty). - Add testthat round-trip + feature tests and minimal JSON fixtures; add
PhenotypeLibraryas a suggested dependency for broader regression coverage.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
R/jsonToCapr.R |
New decompiler implementation for Circe JSON → Capr code. |
R/query.R |
Adds doseEra() and changes Query JSON serialization to omit CodesetId for empty concept sets. |
R/attributes-logic.R |
Adds a key-value attribute class and conditionTypeExclude() constructor. |
tests/testthat/test-jsonToCapr.R |
Adds round-trip + SQL/semantic equivalence tests and feature-specific assertions. |
tests/testthat/resources/*.json |
Adds minimal Circe JSON fixtures for targeted decompile behaviors. |
NAMESPACE |
Exports newly added public functions/constructors. |
DESCRIPTION |
Updates roxygen note and adds PhenotypeLibrary + remote for suggested regression testing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Generated by roxygen2: do not edit by hand | ||
|
|
||
| S3method(compile,Cohort) | ||
| export(age) | ||
| export(as.json) | ||
| export(atLeast) |
There was a problem hiding this comment.
NAMESPACE no longer registers S3method(compile, Cohort) (it was removed in this PR), but the codebase still imports generics::compile and defines compile.Cohort with @exportS3Method compile Cohort in R/cohort.R. Without the S3method registration, generics::compile() won’t dispatch to compile.Cohort. Either restore the S3method registration (or adjust roxygen so it’s generated) or remove the S3method-oriented import/docs if S4-only dispatch is intended.
There was a problem hiding this comment.
remove the S3 generic. Stick to S4 only.
…support "At least 2 events" (any two rows), and "At least 2 distinct values of some column" (e.g. 2 distinct drug concepts) in Circe/Atlas.
…ttributes like "ConditionTypeExclude" : false. These are almost always false in practice and might be from legacy Circe functionality. Nevertheless it helps to have these so we can match the Circe json output exactly.
…s with comparing Circe json to Capr generated json because the order matches.
…ject extra concept sets into the CIRCE JSON produced by compile.Cohort, even if the cohort definition doesn’t reference them. This occurs in Atlas json and is necessary to regenerate the same json/sql as the input.
…ws on a list of dataframes.
|
I think it's working pretty well now. In the thatthat test I picked 5 random phenotype libraries to run the roundtrip test on. But almost all of the phenotype library cohorts now pass the roundtrip test for both json and sql. I still need to review the copilot comments. Ready for review. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 34 out of 34 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
R/jsonToCapr.R
Outdated
| #' Measurement/DrugExposure attributes, exit strategies (\code{fixedExit}, | ||
| #' \code{drugExit}, \code{observationExit}), censoring criteria, and era collapse. | ||
| #' | ||
| #' @section Unsupported (fail in strict mode or skip with \code{mode = "skip"}): | ||
| #' Domains: Specimen, VisitDetail, DoseEra; VisitOccurrence.ProviderSpecialty; |
There was a problem hiding this comment.
The roxygen docs list DoseEra as unsupported, but domainKeyToQueryFun() maps DoseEra to doseEra() and doseEra() is exported. Update the Supported/Unsupported sections so they match the actual implementation (or remove the mapping if it’s truly unsupported).
| #' Measurement/DrugExposure attributes, exit strategies (\code{fixedExit}, | |
| #' \code{drugExit}, \code{observationExit}), censoring criteria, and era collapse. | |
| #' | |
| #' @section Unsupported (fail in strict mode or skip with \code{mode = "skip"}): | |
| #' Domains: Specimen, VisitDetail, DoseEra; VisitOccurrence.ProviderSpecialty; | |
| #' Measurement/DrugExposure attributes, DoseEra domains, exit strategies (\code{fixedExit}, | |
| #' \code{drugExit}, \code{observationExit}), censoring criteria, and era collapse. | |
| #' | |
| #' @section Unsupported (fail in strict mode or skip with \code{mode = "skip"}): | |
| #' Domains: Specimen, VisitDetail; VisitOccurrence.ProviderSpecialty; |
| id2 <- collectGuid(x@attributes[[ii]]@group) | ||
| # id2 is a list of tibbles; do not flatten (would turn tibbles into columns) | ||
| dfs <- purrr::keep(id2, is.data.frame) | ||
| if (length(dfs) > 0L) ids <- dplyr::bind_rows(ids, dfs) | ||
| } |
There was a problem hiding this comment.
collectGuid() for Query now only keeps top-level data.frames from collectGuid(x@attributes[[ii]]@group). If the correlated criteria group contains nested groups, collectGuid(Group) returns nested lists and those nested tibbles will be dropped here, causing missing GUIDs / incorrect codeset ID remapping. Consider recursively collecting data.frames from the nested list structure (without flattening data.frames).
There was a problem hiding this comment.
I'm not sure we have nested lists though...
| items <- cs$expression$items %||% list() | ||
| for (j in seq_along(items)) { | ||
| cpt <- items[[j]]$concept |
There was a problem hiding this comment.
%||% is used throughout this test file but is not defined/imported here. Since %||% is imported from rlang but not exported by Capr, these tests will fail unless the operator is explicitly referenced (e.g., rlang::%||%) or defined locally.
There was a problem hiding this comment.
import %||% from rlang
tests/testthat/test-jsonToCapr.R
Outdated
| tryCatch( | ||
| jsonToCaprFile(jsonPath, rPath, mode = "skip"), | ||
| error = function(e) return(list(ok = FALSE, name = name, stage = "jsonToCapr", msg = conditionMessage(e))) | ||
| ) | ||
| if (!file.exists(rPath)) { | ||
| return(list(ok = FALSE, name = name, stage = "write", msg = "R file was not created")) | ||
| } |
There was a problem hiding this comment.
The tryCatch(...) result from jsonToCaprFile() isn’t captured/checked. A failure in jsonToCaprFile() will return a list from the error handler, but the function will keep going and report a misleading stage = "write" error instead of stage = "jsonToCapr". Capture the tryCatch result and return() it when it’s an error list.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ts with nested criteria.
- Add remapCirceCodesetIdsToOriginal() to map internal codeset ids (0,1,2,...) from replaceCodesetId back to original ids from the cohort/includeConceptSets, so round-trip JSON keeps original concept set numbering. - When includeConceptSets is provided, use it as the full ConceptSets list (order and ids) instead of merging and re-numbering; then remap all CodesetId / *SourceConcept references in the CIRCE structure to those ids. - Fixes round-trip for cohorts that rely on specific ids or duplicate concept sets.
- Extract codesetId once (CodesetId or CodesetID) in criterionNodeToCapr. - When a criterion has no CodesetId and no SourceConcept, treat as "any" event (e.g. inclusion rule "any condition") and emit conceptSetVar = "NULL" instead of skipping/erroring. - When AdditionalCriteria is present but empty, emit withAll() so round-trip preserves AdditionalCriteria and CIRCE/SQL matches (e.g. outcome_death QualifiedLimit ordinal).
Hi @mdlavallee92,
I just saw you were working on coercing Atlas json into Capr objects. If this PR works it might get us there too. I'm trying to implement a jsonToCapr function that takes as input an Atlas json and returns Capr R code. I'm working through the OHDSI phenotype library and checking that the capr code that is produced matches the original json and the SQL generated by both the original and new json files is equivalent.
I had to add a few missing attribute constructors but overall I'm making good progress.
Created on 2026-02-09 with reprex v2.1.1
I added a helper
insertCaprCode(jsonFiles[[1]])that will use rstudioapi to insert the code in your current R file for interactive use.So far many of the phenotype library json are passing the "roundtrip" test but not all. Still working through them. One thing I'd like your input on is if we could avoid requiring a connection to a vocabulary to create the concept attributes. I think we could get what we need from the json without going to the vocabulary but let me know what you think. The relevant line is here:
Capr/R/attributes-concept.R
Line 119 in 7d18310
#125