A package for conducting pairwise comparative effectiveness studies using observational healthcare data in the OMOP Common Data Model format.
- A database in Common Data Model version 5 in one of these platforms: SQL Server, Oracle, PostgreSQL, IBM Netezza, Apache Impala, Amazon RedShift, Google BigQuery, Spark, or Microsoft APS.
- Conda (Miniconda or Anaconda)
- JDBC driver for your database platform
git clone https://github.com/unmtransinfo/PTSDpairwise.git
cd PTSDpairwiseYou can download the official PostgreSQL JDBC driver (postgresql-<version>.jar) from the PostgreSQL JDBC website.
Place the driver into a folder that's accessible (e.g., ~/jdbcDrivers).
Create and activate a Conda environment with R version 4.1.2, OpenJDK, and libsodium. This must be done before setting up the R packages:
conda create -n ptsdpairwise -c conda-forge r-base=4.1.2 openjdk=11 libsodium r-shiny r-dt
conda activate ptsdpairwiseAfter activating the conda environment, set up the Java library path and configure R:
# Set LD_LIBRARY_PATH for this conda environment
conda env config vars set LD_LIBRARY_PATH=$CONDA_PREFIX/lib/server
# Reactivate the environment to apply the new setting
conda deactivate
conda activate ptsdpairwise
# Configure R's Java settings
R CMD javareconfFollow these instructions for additional R environment setup if needed.
Once the Conda environment is active and Java is configured, install the package dependencies in R:
renv::restore()If renv mentions that the project already has a lockfile, select "1: Restore the project from the lockfile."
After restoring the dependencies, install the PTSDpairwise package itself from the project directory:
renv::install(".")The andromedaTemp directory must exist before running the analysis, or the run will fail:
mkdir -p ~/PTSDpairwise/andromedaTemp
mkdir -p ~/PTSDpairwise/outputThis step is only needed if you want to regenerate the pairwise comparison configuration. The script extras/createTcosListFile.R creates all pairwise treatment comparisons for the study.
The inst/excluded_covariate_concept_ids/ folder contains CSV files with concept IDs that must be excluded from propensity score matching for each cohort. These files were exported from ATLAS and map 1-to-1 with the cohort definition JSON files in inst/cohorts/. For example, inst/excluded_covariate_concept_ids/4.csv corresponds to inst/cohorts/4.json.
If you modify cohort definitions, you must also update the corresponding excluded covariate files by exporting the new excluded concept sets from ATLAS. Each CSV file must contain an Id column with the concept IDs to exclude.
cd ~/PTSDpairwise
conda activate ptsdpairwiseThen in R:
source("extras/createTcosListFile.R")The script performs the following steps:
-
Defines cohort IDs: Uses a predefined list of drug class cohort IDs (e.g., Barbiturates=4, Alpha_agonist=8, Benzodiazepines=13, SSRI=31, etc.)
-
Creates pairwise combinations: Generates all unique pairs of target/comparator cohorts (e.g., 4 vs 8, 4 vs 9, ..., 32 vs 33)
-
Loads excluded covariates: For each cohort pair, reads the excluded covariate concept IDs from the CSV files in
inst/excluded_covariate_concept_ids/and merges them -
Sets outcome IDs: Assigns outcome IDs (default:
4;5;35for Psychiatric Hospitalization, Non-Psychiatric Hospitalization, and Self Harm) -
Writes output: Creates
inst/settings/TcosOfInterest.csvwith columns:targetId: The target cohort IDcomparatorId: The comparator cohort IDoutcomeIds: Semicolon-separated outcome IDsexcludedCovariateConceptIds: Merged excluded concept IDs for both cohortsincludedCovariateConceptIds: (empty by default)
The file inst/settings/TcosOfInterest.csv already contains a pre-generated configuration if you don't need to regenerate it.
Before running the study, you must create a database schema where the study cohorts will be stored. This schema requires write access. For example, in PostgreSQL:
CREATE SCHEMA ptsd_cohort_schema;Edit extras/CodeToRun.R to configure the following parameters for your environment:
myusername: Your database usernameconnectionDetails: Database connection settings (dbms, server, connectionString, pathToDriver)cdmDatabaseSchema: Schema containing the CDM datacohortDatabaseSchema: Schema for creating study cohorts (requires write access)cohortTable: Name of the cohort table to createdatabaseId: Short identifier for your database (used in output filenames)databaseName: Full name of your databasedatabaseDescription: Description of your database
Then run the script from R:
source("extras/CodeToRun.R")The script will prompt for your database password and execute the study.
After the analysis completes, the script will launch the Evidence Explorer Shiny app to view results.
Notes:
- You can save plots from within the Shiny app
- It is possible to view results from more than one database by applying
prepareForEvidenceExplorerto the Results file from each database, using the same data folder - Set
blind = FALSEif you wish to be unblinded to the final results
To upload the results to the OHDSI SFTP server, edit the privateKeyFileName and userName variables in extras/CodeToRun.R and uncomment the uploadResults call.
Both CohortGenerator and CohortMethod have built-in checkpointing capabilities that allow you to resume interrupted runs without starting from scratch.
The study generates intermediate files that serve as checkpoints. When you re-run the study, existing files are detected and the corresponding steps are skipped:
| File/Pattern | Created By | Resume Behavior |
|---|---|---|
output/CohortCounts.csv |
createCohorts() |
Counts are regenerated each run (no caching) |
output/cmOutput/CmData_l1_t*_c*.zip |
CohortMethod::runCmAnalyses() |
Skipped if file exists - covariate data is reused |
output/cmOutput/Ps_l1_s1_p1_t*_c*.rds |
CohortMethod::runCmAnalyses() |
Skipped if file exists - propensity score models are reused |
output/cmOutput/StudyPop_*.rds |
CohortMethod::runCmAnalyses() |
Skipped if file exists - study populations are reused |
output/cmOutput/Strat*.rds |
CohortMethod::runCmAnalyses() |
Skipped if file exists - stratification is reused |
output/cmOutput/Analysis_1/om_*.rds |
CohortMethod::runCmAnalyses() |
Skipped if file exists - outcome models are reused |
output/cmOutput/outcomeModelReference.rds |
CohortMethod::runCmAnalyses() |
Reference table tracking all analysis files |
If your run is interrupted (e.g., database timeout, memory error, PS model fitting failure), you can simply re-run the same command:
execute(connectionDetails = connectionDetails,
cdmDatabaseSchema = cdmDatabaseSchema,
cohortDatabaseSchema = cohortDatabaseSchema,
cohortTable = cohortTable,
outputFolder = outputFolder,
createCohorts = FALSE, # Set to FALSE to skip cohort regeneration
runAnalyses = TRUE,
...)Key points for resuming:
-
Set
createCohorts = FALSEif cohorts were already created successfully. This skips the cohort generation step and preserves the existing cohort table in the database. -
CohortMethod automatically resumes from where it left off by checking for existing intermediate files. It will:
- Skip extracting CmData for target-comparator pairs that already have
.zipfiles - Skip fitting propensity score models that already have
.rdsfiles - Skip fitting outcome models that already exist
- Skip extracting CmData for target-comparator pairs that already have
-
The
outcomeModelReference.rdsfile tracks all planned analyses. If this file exists, CohortMethod uses it to determine which analyses still need to be completed.
By default, CohortGenerator::generateCohortSet() is called without the incremental = TRUE parameter in this package. This means:
- Cohorts are regenerated each time
createCohorts = TRUEis set - The cohort table is recreated from scratch
- Use
createCohorts = FALSEon subsequent runs to preserve existing cohorts
Some target-comparator pairs may fail due to insufficient subjects in one or both cohorts. The minCohortSize parameter (default: 100) filters out these pairs before analysis begins:
execute(connectionDetails = connectionDetails,
...,
minCohortSize = 100, # Minimum subjects required in both target and comparator
...)Pairs that don't meet the minimum are logged and excluded:
Excluding 15 target-comparator pairs with < 100 subjects:
t4 (n=75) vs c14 (n=0)
t4 (n=75) vs c26 (n=0)
...
This prevents errors from comparisons that would inevitably fail due to:
- Zero subjects in one cohort
- Insufficient data for propensity score model convergence
- High correlation between covariates and treatment assignment
To start completely fresh, remove the output folder:
rm -rf output/
mkdir -p outputOr to preserve cohort counts but re-run analyses:
rm -rf output/cmOutput/The PTSDpairwise package is licensed under Apache License 2.0
PTSDpairwise was developed in ATLAS and R Studio.
Unknown