Skip to content

Extract covariates with timeId's an those without in same call #311

@egillax

Description

@egillax

It would be nice to be able to extract covariates with timeIds (temporalCovariates, temporalSequenceCovariates, cohortBasedTemporalCovariate) with regular settings like created with createCovariateSettings.

For example:

library(FeatureExtraction)

phenoTypeDefs <- PhenotypeLibrary::getPlCohortDefinitionSet(1152:1215)

# create cohorts if don't exist
connectionDetails <- Eunomia::getEunomiaConnectionDetails()
Eunomia::createCohorts(connectionDetails)
CohortGenerator::createCohortTables(connectionDetails = connectionDetails,
                                    cohortDatabaseSchema = "main",
                                    cohortTableNames = CohortGenerator::getCohortTableNames("phenotypes"))
cohortGenerated <- CohortGenerator::generateCohortSet(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = "main",
  cohortDatabaseSchema = "main",
  cohortTableNames = CohortGenerator::getCohortTableNames("phenotypes"),
  cohortDefinitionSet = phenoTypeDefs
)

covariateSettings <- list(
  createCovariateSettings(
    useDemographicsGender = TRUE,
    useDemographicsAge = TRUE,
    useCharlsonIndex = TRUE
  ),
  createCohortBasedTemporalCovariateSettings(
    analysisId = 49,
    covariateCohortDatabaseSchema = "main",
    covariateCohortTable = "phenotypes",
    covariateCohorts  = phenoTypeDefs
  )
)

covariateData <- getDbCovariateData(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = "main",
  cohortDatabaseSchema = "main",
  cohortTable = "cohort",
  cohortIds = 1,
  rowIdField = "subject_id",
  covariateSettings = covariateSettings
)

This currently fails with:

Error in dbAppendTable(conn, name, value) : 
  Column `timeId` does not exist in target table.

I think this would be much more flexible because you're not reliant on the non-temporal covariates being included in createTemporalCovariateSettings or createTemporalSequenceCovariateSettings. For example there is more stuff in createTemporalCovariateSettings than in the createTemporalSequenceCovariateSettings like Charlson index and Chads2. And of course there is none of that in the createCohortBasedTemporalCovariateSettings, not even demographics like age or gender.

For this to work you would need to add timeIds of NA for the non-temporal covSettings in the covariates table and same for startDay and endDay in analysisRef for the temporal stuff.

I think this should rather be in FeatureExtraction than in PatientLevelPrediction, but I had the following workaround implemented in PLP locally:

  isTemporalList <- c()
  if (inherits(covariateSettings, "list")) { 
    isTemporalList <- sapply(covariateSettings, function(x) {
      isTemporal <- !is.null(x$temporal) && x$temporal
      isTemporalSequence <- !is.null(x$temporalSequence) && x$temporalSequence
      return(isTemporal || isTemporalSequence)
    })
  }

  if (length(unique(isTemporalList)) > 1) {
    ParallelLogger::logInfo("Mixed temporal and non-temporal covariates detected. Processing separately")
    temporalSettingsList <- covariateSettings[isTemporalList]
    staticSettingsList <- covariateSettings[!isTemporalList]

    covariateData <- FeatureExtraction::getDbCovariateData(
      connection = connection,
      tempEmulationSchema = databaseDetails$tempEmulationSchema,
      cdmDatabaseSchema = databaseDetails$cdmDatabaseSchema,
      cdmVersion = databaseDetails$cdmVersion,
      cohortTable = "#cohort_person",
      cohortTableIsTemp = TRUE,
      rowIdField = "row_id",
      covariateSettings = temporalSettingsList
    )
    staticCovs <- FeatureExtraction::getDbCovariateData(
      connection = connection,
      tempEmulationSchema = databaseDetails$tempEmulationSchema,
      cdmDatabaseSchema = databaseDetails$cdmDatabaseSchema,
      cdmVersion = databaseDetails$cdmVersion,
      cohortTable = "#cohort_person",
      cohortTableIsTemp = TRUE,
      rowIdField = "row_id",
      covariateSettings = staticSettingsList
    )
    ParallelLogger::logInfo("Merging covariate data objects...")
    staticCovs$covariates <- staticCovs$covariates %>%
      dplyr::mutate(timeId = as.numeric(NA_real_))
    covariateData$analysisRef <- covariateData$analysisRef %>%
      dplyr::mutate(startDay = as.numeric(NA_real_),
                    endDay = as.numeric(NA_real_))

    Andromeda::appendToTable(covariateData$covariates, staticCovs$covariates)
    Andromeda::appendToTable(covariateData$covariateRef, staticCovs$covariateRef)
    Andromeda::appendToTable(covariateData$analysisRef, staticCovs$analysisRef)
  } else {
  covariateData <- FeatureExtraction::getDbCovariateData(
    connection = connection,
    tempEmulationSchema = databaseDetails$tempEmulationSchema,
    cdmDatabaseSchema = databaseDetails$cdmDatabaseSchema,
    cdmVersion = databaseDetails$cdmVersion,
    cohortTable = "#cohort_person",
    cohortTableIsTemp = TRUE,
    rowIdField = "row_id",
    covariateSettings = covariateSettings
    )
  }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions