-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Currently, CodelistGenerator provides codesFromConceptSet() and codesFromCohort() functions that extract codelists from JSON files containing concept set expressions or cohort definitions.
We propose to expand the management of the concept_sets to the database. In some OMOP CDM setups (such as those managed by IOMED), concept_sets are stored directly in database tables (concept_set, concept_set_item, etc.) within the same database instance as the analysis data.
This feature request proposes adding a new function, getCodelistFromConceptSet(), that queries these database tables directly to build formal codelist objects, similar to how other functions in the package query vocabulary tables directly (e.g., getDrugIngredientCodes(), getICD10StandardCodes()).
Rationale
• Cleaner workflow: Eliminates the need to export/import JSON files when concept sets are already stored natively in the database.
• Consistency: Aligns with the package's philosophy of direct database queries for vocabulary-based codelists.
• Tested workflow: At IOMED, we maintain concept sets in dedicated database tables within the OMOP instance, allowing for streamlined querying without intermediate file handling.
• Efficiency: Reduces overhead of JSON parsing and file I/O when database access is already available.
Proposed Database Schema
The function would work with the OMOP CDM tables and a small extension:
erDiagram
concept_set ||--o{ concept_set_item : "has items"
concept ||--o{ concept_set_item : "is included in"
concept_set {
int concept_set_id PK
text concept_set_name
}
concept {
int concept_id PK
varchar concept_name
varchar domain_id
varchar vocabulary_id
varchar concept_class_id
varchar standard_concept
varchar concept_code
date valid_start_date
date valid_end_date
varchar invalid_reason
}
concept_set_item {
int concept_set_id PK,FK
int concept_id PK,FK
}
concept_class ||--o{ concept : "classifies"
domain ||--o{ concept : "belongs to"
vocabulary ||--o{ concept : "from"
Proposed Function Signature and Implementation
See OmopHelpers for the full implementation.
getCodelistFromConceptSet <- function(conceptSetId, con, cdmSchema) {
# Point to the required tables in the database
concept_set_tbl <- dplyr::tbl(con, dbplyr::in_schema(cdmSchema, "concept_set"))
concept_set_item_tbl <- dplyr::tbl(con, dbplyr::in_schema(cdmSchema, "concept_set_item"))
# Retrieve the name of the concept set to use as the codelist name
codelistName <- concept_set_tbl |>
dplyr::filter(.data$concept_set_id == conceptSetId) |>
dplyr::pull("concept_set_name") |>
unique()
# Error handling: check if the concept set ID was found
if (length(codelistName) == 0) {
stop(glue::glue("No concept set found for concept_set_id: {conceptSetId}"))
}
# Warning if multiple names exist for the same ID
if (length(codelistName) > 1) {
warning(glue::glue("Multiple names found for concept_set_id: {conceptSetId}. Using the first one: '{codelistName[1]}'"))
codelistName <- codelistName[1]
}
codelistName <- clean_name(codelistName)
# Retrieve all unique concept IDs associated with the concept set ID
concept_ids <- concept_set_item_tbl |>
dplyr::filter(.data$concept_set_id == conceptSetId) |>
dplyr::pull("concept_id") |>
unique()
# Create a named list structure required by newCodelist
codelist <- list(concept_ids) |>
magrittr::set_names(codelistName)
# Return the formal, validated codelist object
return(omopgenerics::newCodelist(codelist))
}Implementation Details
The function would:
- Query concept_set table: Retrieve the concept_set_name for the given conceptSetId to use as the codelist name.
- Query concept_set_item table: Get all associated concept_ids for the concept set.
- Name cleaning: Apply name standardization (e.g., via a clean_name() helper function).
- Codelist creation: Build a named list and return an omopgenerics::newCodelist object.
- Error handling: Validate that the concept set exists and handle edge cases like multiple names.
Dependencies
• Requires omopgenerics package for newCodelist()
• Uses dplyr for database operations
• Assumes clean_name() helper function (could be added or use existing package utilities)
Related Functions
• codesFromConceptSet(): Current JSON-based approach
• getDrugIngredientCodes(): Similar direct database querying pattern
• getICD10StandardCodes(): Another vocabulary table query function
Testing Considerations
• Unit tests with mock database containing concept_set tables
• Integration tests with real OMOP CDM databases
• Edge case testing (missing concept sets, empty results, etc.)