Skip to content

Issue125 decompile#126

Open
ablack3 wants to merge 51 commits intodevelopfrom
issue125-decompile
Open

Issue125 decompile#126
ablack3 wants to merge 51 commits intodevelopfrom
issue125-decompile

Conversation

@ablack3
Copy link
Collaborator

@ablack3 ablack3 commented Feb 9, 2026

Hi @mdlavallee92,

I just saw you were working on coercing Atlas json into Capr objects. If this PR works it might get us there too. I'm trying to implement a jsonToCapr function that takes as input an Atlas json and returns Capr R code. I'm working through the OHDSI phenotype library and checking that the capr code that is produced matches the original json and the SQL generated by both the original and new json files is equivalent.

I had to add a few missing attribute constructors but overall I'm making good progress.

library(Capr)
jsonFolder <- system.file("cohorts", package = "PhenotypeLibrary", mustWork = TRUE)

jsonFiles <- list.files(jsonFolder, pattern = "\\.json$", full.names = TRUE)

cat(jsonToCapr(jsonFiles[[1]]))
#> # Generated by jsonToCapr()
#> # Source JSON: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/PhenotypeLibrary/cohorts/10.json
#> 
#> library(Capr)
#> 
#> # --- concept sets ---
#> csNauseaOrVomiting3 <- cs(descendants(31967, 4158495, 4101344, 4012500, 201218), name = "Nausea or Vomiting", id = 3)
#> 
#> attritionObj <- attrition(
#>   expressionLimit = "All"
#> )
#> 
#> # --- cohort ---
#> cohortDef <- cohort(
#>   entry = entry(
#>     conditionOccurrence(csNauseaOrVomiting3, conditionTypeExclude(FALSE)),
#>     observation(csNauseaOrVomiting3),
#>     observationWindow = continuousObservation(priorDays = 0L, postDays = 0L),
#>     primaryCriteriaLimit = "All",
#>     qualifiedLimit = "First"
#>   ),
#>   attrition = attritionObj,
#>   exit = exit(
#>     endStrategy = fixedExit(index = "endDate", offsetDays = 14L)
#>   ),
#>   era = era(eraDays = 0L)
#> )

Created on 2026-02-09 with reprex v2.1.1

I added a helper insertCaprCode(jsonFiles[[1]]) that will use rstudioapi to insert the code in your current R file for interactive use.

So far many of the phenotype library json are passing the "roundtrip" test but not all. Still working through them. One thing I'd like your input on is if we could avoid requiring a connection to a vocabulary to create the concept attributes. I think we could get what we need from the json without going to the vocabulary but let me know what you think. The relevant line is here:

buildConceptAttribute <- function(ids, attributeName, connection, vocabularyDatabaseSchema) {

#125

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a decompiler path to Capr so Atlas/Circe cohort JSON can be converted back into equivalent Capr R code, and introduces regression tests/fixtures to validate JSON ↔ Capr round-trips (including SQL equivalence checks via CirceR when available).

Changes:

  • Add jsonToCapr() / jsonToCaprFile() to decompile Circe cohort JSON into Capr code (concept sets + cohort() construction).
  • Extend query/attribute support to better preserve Circe semantics (e.g., conditionTypeExclude(FALSE), omit CodesetId when concept set is empty).
  • Add testthat round-trip + feature tests and minimal JSON fixtures; add PhenotypeLibrary as a suggested dependency for broader regression coverage.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
R/jsonToCapr.R New decompiler implementation for Circe JSON → Capr code.
R/query.R Adds doseEra() and changes Query JSON serialization to omit CodesetId for empty concept sets.
R/attributes-logic.R Adds a key-value attribute class and conditionTypeExclude() constructor.
tests/testthat/test-jsonToCapr.R Adds round-trip + SQL/semantic equivalence tests and feature-specific assertions.
tests/testthat/resources/*.json Adds minimal Circe JSON fixtures for targeted decompile behaviors.
NAMESPACE Exports newly added public functions/constructors.
DESCRIPTION Updates roxygen note and adds PhenotypeLibrary + remote for suggested regression testing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1 to 5
# Generated by roxygen2: do not edit by hand

S3method(compile,Cohort)
export(age)
export(as.json)
export(atLeast)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAMESPACE no longer registers S3method(compile, Cohort) (it was removed in this PR), but the codebase still imports generics::compile and defines compile.Cohort with @exportS3Method compile Cohort in R/cohort.R. Without the S3method registration, generics::compile() won’t dispatch to compile.Cohort. Either restore the S3method registration (or adjust roxygen so it’s generated) or remove the S3method-oriented import/docs if S4-only dispatch is intended.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the S3 generic. Stick to S4 only.

…support "At least 2 events" (any two rows), and "At least 2 distinct values of some column" (e.g. 2 distinct drug concepts) in Circe/Atlas.
…ttributes like "ConditionTypeExclude" : false. These are almost always false in practice and might be from legacy Circe functionality. Nevertheless it helps to have these so we can match the Circe json output exactly.
…s with comparing Circe json to Capr generated json because the order matches.
…ject extra concept sets into the CIRCE JSON produced by compile.Cohort, even if the cohort definition doesn’t reference them. This occurs in Atlas json and is necessary to regenerate the same json/sql as the input.
@ablack3
Copy link
Collaborator Author

ablack3 commented Feb 9, 2026

I think it's working pretty well now. In the thatthat test I picked 5 random phenotype libraries to run the roundtrip test on. But almost all of the phenotype library cohorts now pass the roundtrip test for both json and sql. I still need to review the copilot comments. Ready for review.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

R/jsonToCapr.R Outdated
Comment on lines 12 to 16
#' Measurement/DrugExposure attributes, exit strategies (\code{fixedExit},
#' \code{drugExit}, \code{observationExit}), censoring criteria, and era collapse.
#'
#' @section Unsupported (fail in strict mode or skip with \code{mode = "skip"}):
#' Domains: Specimen, VisitDetail, DoseEra; VisitOccurrence.ProviderSpecialty;
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The roxygen docs list DoseEra as unsupported, but domainKeyToQueryFun() maps DoseEra to doseEra() and doseEra() is exported. Update the Supported/Unsupported sections so they match the actual implementation (or remove the mapping if it’s truly unsupported).

Suggested change
#' Measurement/DrugExposure attributes, exit strategies (\code{fixedExit},
#' \code{drugExit}, \code{observationExit}), censoring criteria, and era collapse.
#'
#' @section Unsupported (fail in strict mode or skip with \code{mode = "skip"}):
#' Domains: Specimen, VisitDetail, DoseEra; VisitOccurrence.ProviderSpecialty;
#' Measurement/DrugExposure attributes, DoseEra domains, exit strategies (\code{fixedExit},
#' \code{drugExit}, \code{observationExit}), censoring criteria, and era collapse.
#'
#' @section Unsupported (fail in strict mode or skip with \code{mode = "skip"}):
#' Domains: Specimen, VisitDetail; VisitOccurrence.ProviderSpecialty;

Copilot uses AI. Check for mistakes.
Comment on lines 46 to 50
id2 <- collectGuid(x@attributes[[ii]]@group)
# id2 is a list of tibbles; do not flatten (would turn tibbles into columns)
dfs <- purrr::keep(id2, is.data.frame)
if (length(dfs) > 0L) ids <- dplyr::bind_rows(ids, dfs)
}
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collectGuid() for Query now only keeps top-level data.frames from collectGuid(x@attributes[[ii]]@group). If the correlated criteria group contains nested groups, collectGuid(Group) returns nested lists and those nested tibbles will be dropped here, causing missing GUIDs / incorrect codeset ID remapping. Consider recursively collecting data.frames from the nested list structure (without flattening data.frames).

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we have nested lists though...

Comment on lines +13 to +15
items <- cs$expression$items %||% list()
for (j in seq_along(items)) {
cpt <- items[[j]]$concept
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%||% is used throughout this test file but is not defined/imported here. Since %||% is imported from rlang but not exported by Capr, these tests will fail unless the operator is explicitly referenced (e.g., rlang::%||%) or defined locally.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import %||% from rlang

Comment on lines 64 to 70
tryCatch(
jsonToCaprFile(jsonPath, rPath, mode = "skip"),
error = function(e) return(list(ok = FALSE, name = name, stage = "jsonToCapr", msg = conditionMessage(e)))
)
if (!file.exists(rPath)) {
return(list(ok = FALSE, name = name, stage = "write", msg = "R file was not created"))
}
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tryCatch(...) result from jsonToCaprFile() isn’t captured/checked. A failure in jsonToCaprFile() will return a list from the error handler, but the function will keep going and report a misleading stage = "write" error instead of stage = "jsonToCapr". Capture the tryCatch result and return() it when it’s an error list.

Copilot uses AI. Check for mistakes.
ablack3 and others added 27 commits February 9, 2026 04:49
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Add remapCirceCodesetIdsToOriginal() to map internal codeset ids (0,1,2,...)
  from replaceCodesetId back to original ids from the cohort/includeConceptSets,
  so round-trip JSON keeps original concept set numbering.
- When includeConceptSets is provided, use it as the full ConceptSets list
  (order and ids) instead of merging and re-numbering; then remap all
  CodesetId / *SourceConcept references in the CIRCE structure to those ids.
- Fixes round-trip for cohorts that rely on specific ids or duplicate
  concept sets.
- Extract codesetId once (CodesetId or CodesetID) in criterionNodeToCapr.
- When a criterion has no CodesetId and no SourceConcept, treat as "any"
  event (e.g. inclusion rule "any condition") and emit conceptSetVar =
  "NULL" instead of skipping/erroring.
- When AdditionalCriteria is present but empty, emit withAll() so
  round-trip preserves AdditionalCriteria and CIRCE/SQL matches (e.g.
  outcome_death QualifiedLimit ordinal).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant