Identification of Patterns in Stroke Care Transitions using OHDSI Pharmetrics+ data
Stroke survivors receive rehabilitation services (physical therapy, occupational therapy, and/or speech-language therapy) in different locations at different rates in the period following their stroke. Differences between levels of care location has been identified as a potential avenue for understanding the divergence in the quality of stroke post-acute treatment outcomes.
Our hypothesis is that it is possible to use OHDSI data to trace the various paths that stroke survivors take through post-incident care so that the efficacy and efficiency of the different care transitions can be evaluated. We were able to create a proof-of-concept process on a limited cohort (stroke patients with aphasia, with patient IDs attached to their post-acute speech therapy treatments and treatment locations) to show it is possible to trace the different paths through treatment post-stroke. To replicate our process, please follow the steps below. Note that for true reproducability it is necessary to access the AWS for the Northeastern OHDSI Center.
In order to access the OHDSI database, a new user must first read the OHDSI Lab User Guide and complete several steps (1-12). Our tutorial briefly outlines these first steps but primarily acts as supplemental instructions for Step 13 in the OHDSI User Guide to set up a workspace environment with Python instead of R.
These steps work in relation to one having access to OHDSI's Amazon Workspace.
Follow each step in the tutorial. This repo is a quick tutorial of how to use python for the OHDSI database. Steps include:
- AWS Setup
- Miniconda
- Python environment
- Git
- Make
- Config of Redshift credentials
- Redshift database connection
- Pandas to process tables, read, and write
Refer to config_template.ini and how_to_use_templates.md for additional help.
We will create various intermediate tables in your own schema. This is done because directly using tables in 'omop_cdm_53_pmtx_202203' schema, which is the original data. Every time that an analysis is performed is extremely slow, since the number of data points in those tables are on a scale of billions. We will use the stroke cohort definition created by Casey Tilton as an index table to filter out any relevant (stroke diagnosis) person_IDs from the omop schema into a table written into your work schema. As a result, you will be working on tables that are on a scale of a million data points in maximum. Run the following command on your Anaconda PowerShell Prompt. Make sure you are at your directory which cloned this repo:
make create_tables
This command is going to run 11 .py files in a correct order. Each of the .py files correspond to each intermediate table that is created. Notice that some of the intermediate tables will require other intermediate tables to be created first, so the order of running these .py files is very important. Now open DBeaver, and check your work space. You must have the following tables in your schema now:
Note that your Object ID won't match these exactly. The important part is if the Row Count Estimate is the same, and make sure none of them are empty. You can check if the tables are empty by double clicking the table name and check the 'Data' tab.
If you see any error, please refer to the 'Makefile' and try running each .py file individually. For example, if you had an error message while 'stroke_ancestor' table is being created, try running the following command:
make stroke_ancestor
This is going to run the 'stroke_ancestor.py' file only, and you can debug by opening the 'stroke_ancestor.py' file and reading through the code.
The following make commands can be used in your Anaconda PowerShell Prompt to plot the figures we have in this repo:
make plot_stroke_desc_concept
make plot_has_aphasia
make plot_stroke_type_aphasia_TRUE
make plot_stroke_type_aphasia_FALSE
make plot_first_discharge
make plot_speech_therapy_aphasia
'make plot_first_discharge' will run 3 .py files, while other commands will only run one .py file at a time.
Confirm you produce the same plots as we did in our EDA.
The actual analysis of discharge paths of stroke patients are done by following command:
make analysis_visit_oc_5_discharge
This analysis is done very simply to provide a frame of what to work on next regarding discharge paths of stroke patients. Any user who wants to do a further analysis regarding discharge paths of stroke patients may use this result as a beginning point.
-
Did your project objectives change based on what your learned from the data or stakeholder?
We treated this project as an AGILE endeavor; a waterfall approach would likely have lead to failure at worst, extreme frustration at best. Our objectives changed both as we explored the data and as we met with the stakeholder, Rob Cavanaugh, each week to share our progress. Rob provided insight into the potential ways to access the data we needed for analysis of care pathways. He provided stroke and speech language codes and suggestions for how to determine if a visit could fall under PT vs OT. Our project team had regular stand-up meetings to discuss blockers, report on progress, and plan next steps.
-
Were data-access or data processing challenges harder than you anticipated?
As noted in our docs/README.md, there was a learning curve with the complexity of the OHDSI database since the data is drawn from real-world interactions and the schema connections can be hard to navigate without a medical knowledge. A medical condition can have various codes associated with it, and these codes can also change throughout a patient's care timeline. Gaps in data availability and comprehensiveness were an additional challenge. As our team tested tables and fields in queries, we often ran into dead ends because of missing data and had to search for alternate paths to acquire necessary data points. That the database uses OMOP, or the Observational Medical Outcomes Partnership, a common data model (CDM) to standardize healthcare data, was a blessing. Not every relational databases has clean, consistent linking variables without tidying first.
The next step would be to create cohorts for all potential paths, e.g., aphasia versus no aphasia diagnosis, different types of treatment, and different locations. We would recommend:
-
Identifying the best markers/concept_IDs for physical and occupational therapy
-
Matching those therapies to patient_IDs from the master stroke incidence file
-
Creating a table with location by visit_start_date, visit_end_date, and discharge_to location for each patient_ID (note that by definition of the initial cohort, the first location is always emergency room and/or inpatient hospital stay)
-
Appending location by date to each therapy to each patient_ID
Analysis can be performed on the resulting table to find:
-
Frequency of therapies, overall and by location
-
Duration (in days) of therapies, overall and by location
Given that Northeastern University’s ODHSI database is incomplete, there is additional data that would be helpful to simplify the process, though it is possible to complete the analysis without this data:
-
Comprehensive provider_specialty data
-
Care_site_type and concept_id in the care_site table
OHDSI
- OHDSI Northeastern
- OHDSI @ Northeastern | Sharepoint
- OHDSI User Guide
- The Book of OHDSI
- OHDSI Lab Login
- Athena
OMOP
