diff --git a/README.md b/README.md
index 359b82c..ce727fd 100644
--- a/README.md
+++ b/README.md
@@ -16,4 +16,4 @@ Get started today and see how Rhino Health's client resources can help you build
## Getting Help
-For additional support, check out [docs.rhinohealth.com](https://docs.rhinohealth.com/hc/en-us) or reach out to [support@rhinohealth.com](mailto:support@rhinohealth.com).
\ No newline at end of file
+For additional support, check out docs.rhinohealth.com or reach out to [support@rhinohealth.com](mailto:support@rhinohealth.com).
\ No newline at end of file
diff --git a/sandbox/pneumonia-prediction/2_data_engineering.ipynb b/sandbox/pneumonia-prediction/2_data_engineering.ipynb
deleted file mode 100644
index c2a6a5f..0000000
--- a/sandbox/pneumonia-prediction/2_data_engineering.ipynb
+++ /dev/null
@@ -1,225 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "c69d61a4",
- "metadata": {},
- "source": [
- "# Notebook #2: Federated Data Engineering\n",
- "In this notebook, we'll convert CXR DICOM to JPG files and apply the conversion code to multiple sites."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "5224695d-b02b-41ea-a5d6-ad71c4799ac8",
- "metadata": {},
- "source": [
- "### Install the Rhino Health Python SDK, Load All Necessary Libraries and Login to the Rhino FCP"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "52c80562-63c0-4d01-a650-34094fd2b333",
- "metadata": {},
- "outputs": [],
- "source": [
- "pip install --upgrade rhino_health"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "1114292d",
- "metadata": {},
- "outputs": [],
- "source": [
- "import getpass\n",
- "import rhino_health as rh\n",
- "from rhino_health.lib.endpoints.aimodel.aimodel_dataclass import (\n",
- " AIModel,\n",
- " AIModelCreateInput,\n",
- " AIModelRunInput,\n",
- " ModelTypes,\n",
- " CodeRunType\n",
- ")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "7084c9f3-cac9-4b22-8da3-1415d51a7d16",
- "metadata": {},
- "outputs": [],
- "source": [
- "my_username = \"FCP_LOGIN_EMAIL\" # Replace this with the email you use to log into Rhino Health\n",
- "session = rh.login(username=my_username, password=getpass.getpass())"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "c22aa633",
- "metadata": {},
- "source": [
- "### Retrieve Project and Cohort Information"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "00cffc9f-2a49-464a-8380-02a6548df34d",
- "metadata": {},
- "outputs": [],
- "source": [
- "project = session.project.get_project_by_name(\"YOUR_PROJECT_NAME\") # Replace with your project name\n",
- "dataschema = project.data_schemas[0]\n",
- "print(f\"Loaded dataschema '{dataschema.name}' with uid '{dataschema.uid}'\")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "b756f753-7b98-40fd-9f2b-0a505fe64b6b",
- "metadata": {},
- "outputs": [],
- "source": [
- "cxr_schema = project.get_data_schema_by_name('Auto-generated schema for mimic_cxr_dev', project_uid=project.uid)\n",
- "cxr_schema_uid =cxr_schema.uid"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "7153dd89-65cb-4234-9995-7bf72c29c5f4",
- "metadata": {},
- "outputs": [],
- "source": [
- "collaborators = project.collaborating_workgroups\n",
- "workgroups_by_name = {x.name: x for x in collaborators}\n",
- "workgroups_by_uid = {x.uid: x for x in collaborators}\n",
- "hco_workgroup = workgroups_by_name[\"Health System - Sandbox\"]\n",
- "aidev_workgroup = workgroups_by_name[\"Decode Health - Sandbox\"]\n",
- "\n",
- "print(f\"Found workgroups '{aidev_workgroup.name}' and collaborators '{hco_workgroup.name}'\")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "303b52e4-ca0c-4542-859e-3d1f3159b2bc",
- "metadata": {},
- "source": [
- "### Get the CXR Cohorts From Both Sites"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "f3cf0ee9-cc66-48a9-a6c3-d38d461cede9",
- "metadata": {},
- "outputs": [],
- "source": [
- "cohorts = project.cohorts\n",
- "cohorts_by_workgroup = {workgroups_by_uid[x.workgroup_uid].name: x for x in cohorts}\n",
- "hco_cxr_cohort = project.get_cohort_by_name(\"mimic_cxr_hco\")\n",
- "aidev_cxr_cohort = project.get_cohort_by_name(\"mimic_cxr_dev\")\n",
- "hco_cxr_cohort_uid = hco_cxr_cohort.uid\n",
- "aidev_cxr_cohort_uid = aidev_cxr_cohort.uid\n",
- "print(f\"Loaded CXR cohorts '{hco_cxr_cohort.uid}', '{aidev_cxr_cohort.uid}'\")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "11f28226-17a6-4f80-9d9e-bf18ebaa6ef5",
- "metadata": {},
- "source": [
- "### We will use a Pre-defined Container Image with our Model"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "451d6751-e4aa-46b3-aa72-aff8471bc09e",
- "metadata": {},
- "outputs": [],
- "source": [
- "cxr_image_uri= \"913123821419.dkr.ecr.us-east-1.amazonaws.com/rhino-gc-workgroup-rhino-sandbox-decode-health:data-prep-sb-1\""
- ]
- },
- {
- "cell_type": "markdown",
- "id": "62b431d5-e830-4ac4-8c4b-201499e11d88",
- "metadata": {},
- "source": [
- "### Define the Generalized Compute Model that will Convert DICOM Images to JPG Files"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "32a37efb-4f5f-4043-8b2d-ab976e6bc08a",
- "metadata": {},
- "outputs": [],
- "source": [
- "compute_params = AIModelCreateInput(\n",
- " name=\"DICOM to JPG Transformation Code\",\n",
- " description=\"CXR JPG transformation the AI dev and Health System datasets\",\n",
- " input_data_schema_uids = [cxr_schema_uid],\n",
- " output_data_schema_uids = [None], # Auto-Generating the Output Data Schema for the Model\n",
- " project_uid = project.uid,\n",
- " model_type = ModelTypes.GENERALIZED_COMPUTE, \n",
- " config={\"container_image_uri\": cxr_image_uri}\n",
- ")\n",
- "\n",
- "compute_model = session.aimodel.create_aimodel(compute_params)\n",
- "print(f\"Got aimodel '{compute_model.name}' with uid {compute_model.uid}\")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "c4f36033-e597-4c9b-916b-89235af73ebd",
- "metadata": {},
- "source": [
- "### Run the Model Defined in the Previous Cell"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "6626dd0a-228b-4c62-adff-12852c7a9276",
- "metadata": {},
- "outputs": [],
- "source": [
- "run_params = AIModelRunInput(\n",
- " aimodel_uid = compute_model.uid,\n",
- " input_cohort_uids = [aidev_cxr_cohort_uid,hco_cxr_cohort_uid], \n",
- " output_cohort_names_suffix = \"_conv\",\n",
- " timeout_seconds = 600\n",
- ")\n",
- "model_run = session.aimodel.run_aimodel(run_params)\n",
- "run_result = model_run.wait_for_completion()\n",
- "print(f\"Finished running {compute_model.name}\")\n",
- "print(f\"Result status is '{run_result.status.value}', errors={run_result.result_info.get('errors') if run_result.result_info else None}\")"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/sandbox/pneumonia-prediction/4_model_training.ipynb b/sandbox/pneumonia-prediction/4_model_training.ipynb
deleted file mode 100644
index 6a43f4c..0000000
--- a/sandbox/pneumonia-prediction/4_model_training.ipynb
+++ /dev/null
@@ -1,276 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "9d5ace37-8ae8-4bd8-a905-6713750d8129",
- "metadata": {},
- "source": [
- "# Notebook #4: Running Federated Training of the Pneumonia Model"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "b9ef6e7a-63a6-4bbc-b134-ebbd60dfd550",
- "metadata": {},
- "source": [
- "### Install the Rhino Health Python SDK, Load All Necessary Libraries and Login to the Rhino FCP"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "5f3890be-ff13-472e-901a-594fa99e9ddb",
- "metadata": {},
- "outputs": [],
- "source": [
- "import getpass\n",
- "import rhino_health as rh\n",
- "from rhino_health.lib.endpoints.aimodel.aimodel_dataclass import (\n",
- " AIModelCreateInput,\n",
- " ModelTypes,\n",
- " AIModelRunInput,\n",
- " AIModelMultiCohortInput,\n",
- " AIModelTrainInput \n",
- ")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "863b04e3-ea1f-4edf-8e06-6721a591bcb2",
- "metadata": {},
- "outputs": [],
- "source": [
- "my_username = \"FCP_LOGIN_EMAIL\" # Replace this with the email you use to log into Rhino Health\n",
- "session = rh.login(username=my_username, password=getpass.getpass())"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "f1b96218-22e0-40c0-98c9-b1a9b0e3a84a",
- "metadata": {},
- "source": [
- "### Retrieve Project and Cohort Information"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "7fe89328-be4c-4b6d-8a20-d166d98b828c",
- "metadata": {},
- "outputs": [],
- "source": [
- "project = session.project.get_project_by_name(\"YOUR_PROJECT_NAME\") # Replace with your project name"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "cdf16109-f711-4ae8-82b8-a0140aa68aeb",
- "metadata": {},
- "outputs": [],
- "source": [
- "# Get the schema that was created after JPG conversion\n",
- "cxr_schema = project.get_data_schema_by_name('Auto-generated schema for mimic_cxr_hco_conv', project_uid=project.uid)\n",
- "cxr_schema_uid =cxr_schema.uid\n",
- "print(cxr_schema_uid)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "6d32168a-c05d-4e69-b199-1e13e2805067",
- "metadata": {},
- "outputs": [],
- "source": [
- "cohorts = project.cohorts\n",
- "hco_cxr_cohort = project.get_cohort_by_name(\"mimic_cxr_hco_conv\")\n",
- "aidev_cxr_cohort = project.get_cohort_by_name(\"mimic_cxr_dev_conv\")\n",
- "cxr_cohorts = [aidev_cxr_cohort.uid, hco_cxr_cohort.uid]\n",
- "print(f\"Loaded CXR cohorts '{hco_cxr_cohort.uid}', '{aidev_cxr_cohort.uid}'\")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "b01cdbbf-ef6a-4260-80ea-2cf93eedccc6",
- "metadata": {},
- "source": [
- "## Create the Train Test Split Model and then Run it Over both CXR Cohorts\n",
- "We will split both CXR Data Cohorts into two Cohorts one for training and the other testing\n",
- "### We will use a Pre-defined Container Image with our Model"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "ce46343c-0f66-4fcb-83fe-95fda628b1ca",
- "metadata": {},
- "outputs": [],
- "source": [
- "train_split_image_uri = \"913123821419.dkr.ecr.us-east-1.amazonaws.com/rhino-gc-workgroup-rhino-sandbox-decode-health:train-test-split-sb\""
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "dd94c56e-a079-42a4-aff2-29617489539c",
- "metadata": {},
- "outputs": [],
- "source": [
- "aimodel = AIModelCreateInput(\n",
- " name=\"Train Test Split\",\n",
- " description=\"Splitting data into train and test datasets per site\",\n",
- " input_data_schema_uids=[cxr_schema_uid],\n",
- " output_data_schema_uids=[None], # Auto-Generating the Output Data Schema for the Model\n",
- " model_type=ModelTypes.GENERALIZED_COMPUTE,\n",
- " project_uid = project.uid,\n",
- " config={\"container_image_uri\": train_split_image_uri}\n",
- ")\n",
- "aimodel = session.aimodel.create_aimodel(aimodel)\n",
- "print(f\"Got aimodel '{aimodel.name}' with uid {aimodel.uid}\")\n",
- "\n",
- "run_params = AIModelMultiCohortInput(\n",
- " aimodel_uid= aimodel.uid,\n",
- " input_cohort_uids=[aidev_cxr_cohort.uid, hco_cxr_cohort.uid],\n",
- " output_cohort_naming_templates= ['{{ input_cohort_names.0 }} - Train', '{{ input_cohort_names.0 }} - Test'],\n",
- " timeout_seconds=600,\n",
- " sync=False,\n",
- ")\n",
- "\n",
- "print(f\"Starting to run {aimodel.name}\")\n",
- "model_run = session.aimodel.run_aimodel(run_params)\n",
- "run_result = model_run.wait_for_completion()\n",
- "print(f\"Finished running {aimodel.name}\")\n",
- "print(f\"Result status is '{train_result.status.value}', errors={train_result.result_info.get('errors') if train_result.result_info else None}\")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "6252401a-b01a-4a31-aa9c-23a835f25dc7",
- "metadata": {},
- "source": [
- "## Create and Run the Federated Model Training and Validation Across Both of Our Two Sites\n",
- "We will utilize NVFlare to train our pneumonia predicition model using our local training Cohort and the remote Health System training Cohort. The model will then be validated again the local testing Cohort and the remote Health System testing Cohort.\n",
- "### We will use a Pre-defined Container Image with our Model"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "7484e178-ef9d-4f3b-a68f-184d55c05940",
- "metadata": {},
- "outputs": [],
- "source": [
- "model_train_image_uri = \"913123821419.dkr.ecr.us-east-1.amazonaws.com/rhino-gc-workgroup-rhino-sandbox-decode-health:prediction-model-sb-22\""
- ]
- },
- {
- "cell_type": "markdown",
- "id": "4edcbf4b-6f97-4720-bd3e-e6c9b8025680",
- "metadata": {},
- "source": [
- "### Search for our Newly Split Local and Remote Cohorts "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "c0b5ee5f-0e1d-4c80-ba46-f362672f81a1",
- "metadata": {},
- "outputs": [],
- "source": [
- "input_training_cohorts = session.cohort.search_for_cohorts_by_name('Train')\n",
- "input_validation_cohorts = session.cohort.search_for_cohorts_by_name('Test')\n",
- "print(\"Found training cohorts:\")\n",
- "print([x.name for x in input_training_cohorts])\n",
- "print(\"Found validation cohorts:\")\n",
- "print([x.name for x in input_validation_cohorts])"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "e8ca1627-7d0b-4ffa-bb6c-6698dcc3a960",
- "metadata": {},
- "source": [
- "### Create the Pneumonia Prediciton Model"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "b1efeda7-cb4c-446f-ac95-19c5de6ed5fc",
- "metadata": {},
- "outputs": [],
- "source": [
- "aimodel = AIModelCreateInput(\n",
- " name=\"Pneumonia Prediction Model Training\",\n",
- " description=\"Pneumonia Prediction Model Training\",\n",
- " input_data_schema_uids=[cxr_schema_uid],\n",
- " output_data_schema_uids=[None], # Auto-Generating the Output Data Schema for the Model\n",
- " project_uid= project.uid,\n",
- " model_type=ModelTypes.NVIDIA_FLARE_V2_2,\n",
- " config={\"container_image_uri\": model_train_image_uri}\n",
- ")\n",
- "\n",
- "aimodel = session.aimodel.create_aimodel(aimodel)\n",
- "print(f\"Got aimodel '{aimodel.name}' with uid {aimodel.uid}\")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "5d1b79e0-84bb-42b5-9729-1adc3d124ea8",
- "metadata": {},
- "source": [
- "### Run the Pneumonia Prediciton Model Training with Validation"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "c92c2b1d-e98d-4b48-8c2c-07852b5e1c6d",
- "metadata": {},
- "outputs": [],
- "source": [
- "run_params = AIModelTrainInput(\n",
- " aimodel_uid=aimodel.uid,\n",
- " input_cohort_uids=[x.uid for x in input_training_cohorts], \n",
- " one_fl_client_per_cohort=True , \n",
- " validation_cohort_uids=[x.uid for x in input_validation_cohorts], \n",
- " validation_cohorts_inference_suffix=\" - Pneumonia training results\",\n",
- " timeout_seconds=600,\n",
- " config_fed_server=\"\",\n",
- " config_fed_client=\"\",\n",
- " secrets_fed_client=\"\",\n",
- " secrets_fed_server=\"\",\n",
- " sync=False,\n",
- ")\n",
- "\n",
- "print(f\"Starting to run federated training of {aimodel.name}\")\n",
- "model_train = session.aimodel.train_aimodel(run_params)\n",
- "train_result = model_train.wait_for_completion()\n",
- "print(f\"Finished running {aimodel.name}\")\n",
- "print(f\"Result status is '{train_result.status.value}', errors={train_result.result_info.get('errors') if train_result.result_info else None}\")"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/sandbox/pneumonia-prediction/README.md b/sandbox/pneumonia-prediction/README.md
index 00f5d51..3c1d7ea 100644
--- a/sandbox/pneumonia-prediction/README.md
+++ b/sandbox/pneumonia-prediction/README.md
@@ -1,8 +1,8 @@
## Conducting an end-to-end ML project using multimodal data from multiple hospitals
-
+
-**Follow the Tutorial: [Here](https://docs.rhinohealth.com/hc/en-us/articles/15586509051549-Pneumonia-Prediction-Step-1-Scenario-FCP-Overview)**
+**Follow the Tutorial: Here**
## Notebook 1: Import EHR and CXR Datasets
In this step, we gain experience in importing multimodal datasets onto the Rhino FCP. We will use MIMIC-IV as the source to create datasets from a database that hosts EHR data. In addition, we will load CXR DICOM images, which are linked to the EHR tables using a mapping lookup table.
@@ -22,4 +22,4 @@ Visualize evaluation results
Need Help?
-[Rhino Health Documenation Center](https://docs.rhinohealth.com/) or [support@rhinohealth.com](mailto:support@rhinohealth.com)
\ No newline at end of file
+Rhino Health Documenation Center or support@rhinohealth.com
\ No newline at end of file
diff --git a/sandbox/pneumonia-prediction/aidev-data/aidev_dataset.csv b/sandbox/pneumonia-prediction/aidev-data/aidev_dataset.csv
new file mode 100644
index 0000000..23a6ec5
--- /dev/null
+++ b/sandbox/pneumonia-prediction/aidev-data/aidev_dataset.csv
@@ -0,0 +1,77 @@
+"study_id","subject_id","seriesUID","Pneumonia"
+57375967,10000764,"2.25.330801183589872517229652910499992354395",1
+50771383,10000898,"2.25.250773694670812141869716817596977264006",1
+54205396,10000898,"2.25.31143652004437984166369414774860922473",1
+53186264,10001176,"2.25.245002654586718118811646382364196434355",1
+54684191,10001176,"2.25.185607819679630132258653645083363794392",1
+50531538,10018052,"2.25.141676994339261698162242971546085993555",1
+59965534,10066767,"2.25.268248519892184688254558816042424141122",1
+51029426,11000011,"2.25.156817843106450528003710253397726025131",1
+50336039,11000183,"2.25.120580216279625731330685397498034854719",1
+51967845,11000183,"2.25.254110388346453755900548779800859700199",1
+53970869,11000183,"2.25.296070775007374106188602918631213001557",1
+54898709,11000183,"2.25.54737753780642137934161666592639271654",1
+57084339,11000183,"2.25.142504252512476170295693746110551470872",1
+58117097,11000183,"2.25.167977869420158697331688039047735788303",1
+58509443,11000183,"2.25.51315582774251991982977357865140503898",1
+58555910,11000183,"2.25.183366130327701413764523037440150405474",1
+58733084,11000183,"2.25.44723005414861289192382120085380420515",1
+59289932,11000183,"2.25.120417721505081679979342065825911506709",1
+51449744,11000416,"2.25.190871170589449568220958593842969450894",1
+55590752,11000416,"2.25.175265353450860106859669442885811773759",1
+56617354,11000416,"2.25.212150001061493105291689021849267415906",1
+57652741,11000416,"2.25.120750863871124923436739143937752336886",1
+50230446,11000566,"2.25.15706137304706801535200026559750442609",1
+50252971,11000566,"2.25.90401306056572483734689081915760976309",1
+50702026,11000566,"2.25.149351496968738015044376556649752102015",1
+50789010,11000566,"2.25.65524401256552333494281405736076245268",1
+51737583,11000566,"2.25.73304250584195335647054701488612172113",1
+54855307,11000566,"2.25.106649140192713316301798245541731000587",1
+56421164,11000566,"2.25.123588367248417307302811823209249064222",1
+58996402,11000566,"2.25.289703543465349351849461743020936729026",1
+59565087,11000566,"2.25.77210014618202928528735877649336994629",1
+52358194,11000590,"2.25.276197893841055449986642559197959078626",1
+51732447,11001054,"2.25.331382108060360164245980421768418802294",1
+53447201,11001264,"2.25.198675763826303475089638333284438800376",1
+54136122,11001267,"2.25.83329270436138967790073397934679818033",1
+58882809,11001267,"2.25.83617404619653516159371899580404692912",1
+54076811,11001469,"2.25.252335380444831895089618962203377823443",1
+53022275,16021726,"2.25.168454369358651552933060539828881246619",0
+58261299,16024666,"2.25.149916930506941258791021863091790878418",0
+57661212,17000605,"2.25.288518742914838128450579538320439492899",0
+52821744,17001101,"2.25.243906351743687145205923184906016040017",0
+53831730,17001438,"2.25.64407762008439166559276813084448657681",1
+56167317,17020795,"2.25.281918147719415823959079557167129031235",0
+57754443,17020795,"2.25.287045312667541489936981051680521972422",0
+50548939,17055460,"2.25.335938651783267308300037707489815824644",0
+55758528,17055460,"2.25.31702762805515158200060762224322040200",0
+58974095,17055460,"2.25.258076776415131592161548832084572518735",0
+51613820,19000065,"2.25.285136171685507463090196379765550644423",0
+58898689,19000108,"2.25.158692982080442848886609116557063126409",0
+55328702,19007700,"2.25.70314450295303216217642274181772669159",0
+52654671,16000035,"2.25.227072730578370750779849157444442644402",0
+53468612,16000035,"2.25.4802387659682575613761490376940072952",0
+55928380,16000035,"2.25.309736478305484628655331326572429249485",0
+53461983,16000627,"2.25.12865990728966802573419808982607205661",0
+58400857,16000723,"2.25.207845862011460938831868993579043953486",0
+57874958,16000868,"2.25.222032164181554101841880746298612139546",0
+58971884,17001438,"2.25.89943274660832112807938203393242935689",0
+59558528,17001438,"2.25.232117944853542656798144984231087648294",0
+51497652,17001497,"2.25.228239509200967068547556507152614764753",0
+53161617,17001497,"2.25.231233671492252929936542322099066224420",0
+54277770,17001497,"2.25.32089274791730830287527185680878252831",0
+59484629,17001497,"2.25.173203155722886105882203244229875043458",0
+51212589,17002760,"2.25.39762223882025341314538351222107204295",0
+58414548,17002760,"2.25.337943621999416699293738431023637761998",0
+53818182,17003536,"2.25.113193671190407617103066457024910822569",0
+58812027,17004299,"2.25.156957541901853107656751174642876300084",0
+59585309,17004299,"2.25.179329454354447824324079090896018297757",0
+53534710,17006537,"2.25.218601528193961850701764397838121234535",0
+58748017,17028519,"2.25.491153759070081908894548836701795074",0
+53445324,17075570,"2.25.27974328969635899900863075959941182306",0
+58890389,17075570,"2.25.52103617575191164966775564111436517938",0
+53977911,19000445,"2.25.265739604673593359423173712220708170490",0
+57107380,19001012,"2.25.31805876826976238545583777055760976187",0
+58184428,19001012,"2.25.278543354695803945100679148458738243513",0
+53522120,19038970,"2.25.207114819449230665515058987887152489818",1
+55014265,19038970,"2.25.96903578481956703114516898781908640163",0
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/022f740d-cd2d4726-192c284f-2c58fe09-91969a7c.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/022f740d-cd2d4726-192c284f-2c58fe09-91969a7c.dcm
new file mode 100644
index 0000000..21744b3
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/022f740d-cd2d4726-192c284f-2c58fe09-91969a7c.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/03f3117f-485b0f1c-e9dd7a1c-252ceb61-1db72a51.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/03f3117f-485b0f1c-e9dd7a1c-252ceb61-1db72a51.dcm
new file mode 100644
index 0000000..4a818f6
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/03f3117f-485b0f1c-e9dd7a1c-252ceb61-1db72a51.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/08940b6b-848c2d8f-6643fdf8-78500b7b-dcf53801.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/08940b6b-848c2d8f-6643fdf8-78500b7b-dcf53801.dcm
new file mode 100644
index 0000000..efcd972
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/08940b6b-848c2d8f-6643fdf8-78500b7b-dcf53801.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/096052b7-d256dc40-453a102b-fa7d01c6-1b22c6b4.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/096052b7-d256dc40-453a102b-fa7d01c6-1b22c6b4.dcm
new file mode 100644
index 0000000..3c7b276
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/096052b7-d256dc40-453a102b-fa7d01c6-1b22c6b4.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/0bc02d78-8ca58b0a-618d6698-03044a93-af87b014.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/0bc02d78-8ca58b0a-618d6698-03044a93-af87b014.dcm
new file mode 100644
index 0000000..8a688dc
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/0bc02d78-8ca58b0a-618d6698-03044a93-af87b014.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/0e687d5d-cb679f30-86eb7a01-d60349eb-b22d6bce.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/0e687d5d-cb679f30-86eb7a01-d60349eb-b22d6bce.dcm
new file mode 100644
index 0000000..ec2c572
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/0e687d5d-cb679f30-86eb7a01-d60349eb-b22d6bce.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/0f511b43-654b28f6-c27c4f4d-d1f5dd38-5c20abe5.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/0f511b43-654b28f6-c27c4f4d-d1f5dd38-5c20abe5.dcm
new file mode 100644
index 0000000..8f9edc9
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/0f511b43-654b28f6-c27c4f4d-d1f5dd38-5c20abe5.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/0fe0f450-979ab2eb-ec87fcd3-c1660ba9-18517d0a.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/0fe0f450-979ab2eb-ec87fcd3-c1660ba9-18517d0a.dcm
new file mode 100644
index 0000000..188e52b
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/0fe0f450-979ab2eb-ec87fcd3-c1660ba9-18517d0a.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/10ae30cb-84e2834e-867cc18d-98e74eef-c148cc57.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/10ae30cb-84e2834e-867cc18d-98e74eef-c148cc57.dcm
new file mode 100644
index 0000000..01bd84d
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/10ae30cb-84e2834e-867cc18d-98e74eef-c148cc57.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/13b05da9-47e7464f-2616c4ae-2fcbed1b-4cb0be3d.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/13b05da9-47e7464f-2616c4ae-2fcbed1b-4cb0be3d.dcm
new file mode 100644
index 0000000..2f4dd25
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/13b05da9-47e7464f-2616c4ae-2fcbed1b-4cb0be3d.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/147a67b1-4fbe1943-3eefcc6d-02f38f6b-41d86177.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/147a67b1-4fbe1943-3eefcc6d-02f38f6b-41d86177.dcm
new file mode 100644
index 0000000..4dece2b
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/147a67b1-4fbe1943-3eefcc6d-02f38f6b-41d86177.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/155dafcc-593da047-7a17ab79-d144588d-ef242513.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/155dafcc-593da047-7a17ab79-d144588d-ef242513.dcm
new file mode 100644
index 0000000..3a488d6
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/155dafcc-593da047-7a17ab79-d144588d-ef242513.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/15e83484-fae15a34-100da9b0-5171bb0b-fb94f50a.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/15e83484-fae15a34-100da9b0-5171bb0b-fb94f50a.dcm
new file mode 100644
index 0000000..856158d
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/15e83484-fae15a34-100da9b0-5171bb0b-fb94f50a.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/160b960f-a1ec9252-c44ad542-3f4acc6c-9e7214b0.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/160b960f-a1ec9252-c44ad542-3f4acc6c-9e7214b0.dcm
new file mode 100644
index 0000000..4daf728
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/160b960f-a1ec9252-c44ad542-3f4acc6c-9e7214b0.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/16123452-c0737db7-f701ff47-06cc4eda-6679a045.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/16123452-c0737db7-f701ff47-06cc4eda-6679a045.dcm
new file mode 100644
index 0000000..398d525
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/16123452-c0737db7-f701ff47-06cc4eda-6679a045.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/16d00761-feba264b-c3539079-7b0f2fb9-099de4e9.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/16d00761-feba264b-c3539079-7b0f2fb9-099de4e9.dcm
new file mode 100644
index 0000000..739ae70
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/16d00761-feba264b-c3539079-7b0f2fb9-099de4e9.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/1812e284-d7771e26-decb6b99-da28fbd2-70bfb5b5.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/1812e284-d7771e26-decb6b99-da28fbd2-70bfb5b5.dcm
new file mode 100644
index 0000000..1461aec
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/1812e284-d7771e26-decb6b99-da28fbd2-70bfb5b5.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/194863f8-8c2b09b1-6a7c1354-6a93e10a-06c453d6.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/194863f8-8c2b09b1-6a7c1354-6a93e10a-06c453d6.dcm
new file mode 100644
index 0000000..335b307
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/194863f8-8c2b09b1-6a7c1354-6a93e10a-06c453d6.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/1ba3b40e-ad65e2d0-5c5b140e-c41d8154-f85ead78.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/1ba3b40e-ad65e2d0-5c5b140e-c41d8154-f85ead78.dcm
new file mode 100644
index 0000000..43f4288
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/1ba3b40e-ad65e2d0-5c5b140e-c41d8154-f85ead78.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/1d88941b-2c154896-2b3614a0-7dea2b56-08253a4b.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/1d88941b-2c154896-2b3614a0-7dea2b56-08253a4b.dcm
new file mode 100644
index 0000000..6e99c70
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/1d88941b-2c154896-2b3614a0-7dea2b56-08253a4b.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/1fe73f8e-036bd24e-4578c891-33c1746e-864884a7.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/1fe73f8e-036bd24e-4578c891-33c1746e-864884a7.dcm
new file mode 100644
index 0000000..b88766f
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/1fe73f8e-036bd24e-4578c891-33c1746e-864884a7.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/2a280266-c8bae121-54d75383-cac046f4-ca37aa16.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/2a280266-c8bae121-54d75383-cac046f4-ca37aa16.dcm
new file mode 100644
index 0000000..be057f0
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/2a280266-c8bae121-54d75383-cac046f4-ca37aa16.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/2e4678a5-e646a648-d6265814-63b082e3-d14f047a.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/2e4678a5-e646a648-d6265814-63b082e3-d14f047a.dcm
new file mode 100644
index 0000000..f8878b4
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/2e4678a5-e646a648-d6265814-63b082e3-d14f047a.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/3400038e-ece1ed49-527f1500-91c03763-b8c43109.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/3400038e-ece1ed49-527f1500-91c03763-b8c43109.dcm
new file mode 100644
index 0000000..c68e400
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/3400038e-ece1ed49-527f1500-91c03763-b8c43109.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/368bcbac-d1d42a3d-111cc98a-43f3541a-59c0792f.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/368bcbac-d1d42a3d-111cc98a-43f3541a-59c0792f.dcm
new file mode 100644
index 0000000..3c11154
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/368bcbac-d1d42a3d-111cc98a-43f3541a-59c0792f.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/3761aae0-255c0808-86d2121b-88ae172f-b7625d50.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/3761aae0-255c0808-86d2121b-88ae172f-b7625d50.dcm
new file mode 100644
index 0000000..f908895
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/3761aae0-255c0808-86d2121b-88ae172f-b7625d50.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/3b8571b4-1418c4eb-ddf2b4bc-5cb96d9b-3b99df84.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/3b8571b4-1418c4eb-ddf2b4bc-5cb96d9b-3b99df84.dcm
new file mode 100644
index 0000000..e64d1cb
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/3b8571b4-1418c4eb-ddf2b4bc-5cb96d9b-3b99df84.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/3b8b1b7d-054490d5-385641e7-ff43d2c8-9505f058.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/3b8b1b7d-054490d5-385641e7-ff43d2c8-9505f058.dcm
new file mode 100644
index 0000000..9ed62bc
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/3b8b1b7d-054490d5-385641e7-ff43d2c8-9505f058.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/458595bc-5b60d632-1acb3f28-69475e73-af4854fb.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/458595bc-5b60d632-1acb3f28-69475e73-af4854fb.dcm
new file mode 100644
index 0000000..c0f38b7
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/458595bc-5b60d632-1acb3f28-69475e73-af4854fb.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/4a80d320-9a185aca-dad7c54f-b93d3a1c-e195c6ab.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/4a80d320-9a185aca-dad7c54f-b93d3a1c-e195c6ab.dcm
new file mode 100644
index 0000000..8fb9b82
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/4a80d320-9a185aca-dad7c54f-b93d3a1c-e195c6ab.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/4d70fda3-fef37e75-e4072ffd-fb996643-14ecd360.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/4d70fda3-fef37e75-e4072ffd-fb996643-14ecd360.dcm
new file mode 100644
index 0000000..67f8fcc
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/4d70fda3-fef37e75-e4072ffd-fb996643-14ecd360.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/4f54d67a-d40633bf-279b7ff3-14f235a1-c1793502.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/4f54d67a-d40633bf-279b7ff3-14f235a1-c1793502.dcm
new file mode 100644
index 0000000..c5a8533
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/4f54d67a-d40633bf-279b7ff3-14f235a1-c1793502.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/50364eef-e3be6d73-26daca3e-101ddd48-744ce688.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/50364eef-e3be6d73-26daca3e-101ddd48-744ce688.dcm
new file mode 100644
index 0000000..33362da
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/50364eef-e3be6d73-26daca3e-101ddd48-744ce688.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/5282290c-1c7d65b7-bc2c026f-8e8c983d-9b34df7e.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/5282290c-1c7d65b7-bc2c026f-8e8c983d-9b34df7e.dcm
new file mode 100644
index 0000000..e8897f0
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/5282290c-1c7d65b7-bc2c026f-8e8c983d-9b34df7e.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/52b876e2-702ec695-5c850bac-de4bac95-f5bcf1f2.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/52b876e2-702ec695-5c850bac-de4bac95-f5bcf1f2.dcm
new file mode 100644
index 0000000..b4364a7
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/52b876e2-702ec695-5c850bac-de4bac95-f5bcf1f2.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/5661d144-516348da-52232211-7b6a1f35-6739c99d.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/5661d144-516348da-52232211-7b6a1f35-6739c99d.dcm
new file mode 100644
index 0000000..56028de
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/5661d144-516348da-52232211-7b6a1f35-6739c99d.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/580fc24d-1bb7033e-4f8e3d90-7bdc583a-5cca80e8.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/580fc24d-1bb7033e-4f8e3d90-7bdc583a-5cca80e8.dcm
new file mode 100644
index 0000000..46468e9
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/580fc24d-1bb7033e-4f8e3d90-7bdc583a-5cca80e8.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/58f383e7-edcbd8c7-2f6dc2af-eb97ddf1-f7cbc46a.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/58f383e7-edcbd8c7-2f6dc2af-eb97ddf1-f7cbc46a.dcm
new file mode 100644
index 0000000..0b4557a
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/58f383e7-edcbd8c7-2f6dc2af-eb97ddf1-f7cbc46a.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/66f890eb-32e623eb-bfd5b3fd-501c4ce6-3aed539b.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/66f890eb-32e623eb-bfd5b3fd-501c4ce6-3aed539b.dcm
new file mode 100644
index 0000000..2bfa48e
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/66f890eb-32e623eb-bfd5b3fd-501c4ce6-3aed539b.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/67be3e3e-a2e07d4b-9adf955a-96a214df-7bcab490.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/67be3e3e-a2e07d4b-9adf955a-96a214df-7bcab490.dcm
new file mode 100644
index 0000000..58ed391
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/67be3e3e-a2e07d4b-9adf955a-96a214df-7bcab490.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/7c2c17c0-52884edb-dbf010e2-59e9dfaa-6c225d1f.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/7c2c17c0-52884edb-dbf010e2-59e9dfaa-6c225d1f.dcm
new file mode 100644
index 0000000..131ac36
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/7c2c17c0-52884edb-dbf010e2-59e9dfaa-6c225d1f.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/7f2217cd-8d7c0235-439dc318-0c6e09f4-f54598cd.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/7f2217cd-8d7c0235-439dc318-0c6e09f4-f54598cd.dcm
new file mode 100644
index 0000000..e082a79
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/7f2217cd-8d7c0235-439dc318-0c6e09f4-f54598cd.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/7fea64be-4a815bc0-0906f737-b174fd08-48ba5e05.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/7fea64be-4a815bc0-0906f737-b174fd08-48ba5e05.dcm
new file mode 100644
index 0000000..e674330
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/7fea64be-4a815bc0-0906f737-b174fd08-48ba5e05.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/826b3c3a-69311826-cd4bdab7-48c35c4b-ce0e4b18.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/826b3c3a-69311826-cd4bdab7-48c35c4b-ce0e4b18.dcm
new file mode 100644
index 0000000..de37715
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/826b3c3a-69311826-cd4bdab7-48c35c4b-ce0e4b18.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/8612ee21-bd9a753c-ff4c7ee6-681aef0b-aeb229e8.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/8612ee21-bd9a753c-ff4c7ee6-681aef0b-aeb229e8.dcm
new file mode 100644
index 0000000..e4b40e4
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/8612ee21-bd9a753c-ff4c7ee6-681aef0b-aeb229e8.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/8e338050-c72628f4-cf19ef85-cb13d287-5af57beb.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/8e338050-c72628f4-cf19ef85-cb13d287-5af57beb.dcm
new file mode 100644
index 0000000..eabbaa9
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/8e338050-c72628f4-cf19ef85-cb13d287-5af57beb.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/8e778669-68417c37-c9bcf266-7de1750e-7aae1614.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/8e778669-68417c37-c9bcf266-7de1750e-7aae1614.dcm
new file mode 100644
index 0000000..b637889
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/8e778669-68417c37-c9bcf266-7de1750e-7aae1614.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/8fef226c-75a97546-cdb20a70-30d4bfe8-ace66a88.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/8fef226c-75a97546-cdb20a70-30d4bfe8-ace66a88.dcm
new file mode 100644
index 0000000..386c6e6
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/8fef226c-75a97546-cdb20a70-30d4bfe8-ace66a88.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/95a2a575-8bc3b499-d351ac54-092b24da-4b6a1b3b.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/95a2a575-8bc3b499-d351ac54-092b24da-4b6a1b3b.dcm
new file mode 100644
index 0000000..5765854
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/95a2a575-8bc3b499-d351ac54-092b24da-4b6a1b3b.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/96d19ff3-64e9b04a-83c916dd-8f98d633-7f3d57a8.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/96d19ff3-64e9b04a-83c916dd-8f98d633-7f3d57a8.dcm
new file mode 100644
index 0000000..17e56e5
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/96d19ff3-64e9b04a-83c916dd-8f98d633-7f3d57a8.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/98e33b67-e304bdf7-5f59811c-79c1b2c5-928e1f51.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/98e33b67-e304bdf7-5f59811c-79c1b2c5-928e1f51.dcm
new file mode 100644
index 0000000..aa8c3de
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/98e33b67-e304bdf7-5f59811c-79c1b2c5-928e1f51.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/99eb5ea2-76aff341-b0db7fe2-24d9295f-cd6d9b2e.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/99eb5ea2-76aff341-b0db7fe2-24d9295f-cd6d9b2e.dcm
new file mode 100644
index 0000000..ef8bde1
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/99eb5ea2-76aff341-b0db7fe2-24d9295f-cd6d9b2e.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/9e879dbb-6f01de27-0ed8e190-99ab6895-a85717c6.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/9e879dbb-6f01de27-0ed8e190-99ab6895-a85717c6.dcm
new file mode 100644
index 0000000..5ccb24c
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/9e879dbb-6f01de27-0ed8e190-99ab6895-a85717c6.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/9fbae1b5-f7747bfb-b9e830c8-a1238d08-1b2d9f6c.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/9fbae1b5-f7747bfb-b9e830c8-a1238d08-1b2d9f6c.dcm
new file mode 100644
index 0000000..c414dfa
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/9fbae1b5-f7747bfb-b9e830c8-a1238d08-1b2d9f6c.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/a53956c2-b2a7a264-897a0cd9-341a07fa-633d23fa.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/a53956c2-b2a7a264-897a0cd9-341a07fa-633d23fa.dcm
new file mode 100644
index 0000000..d3fc341
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/a53956c2-b2a7a264-897a0cd9-341a07fa-633d23fa.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/aa2fbe40-f7614926-9ab04b56-971dba88-a56b226e.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/aa2fbe40-f7614926-9ab04b56-971dba88-a56b226e.dcm
new file mode 100644
index 0000000..259a751
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/aa2fbe40-f7614926-9ab04b56-971dba88-a56b226e.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/aaa61330-ee126369-915ee5a6-8dc9a914-a24f74a1.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/aaa61330-ee126369-915ee5a6-8dc9a914-a24f74a1.dcm
new file mode 100644
index 0000000..d75939f
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/aaa61330-ee126369-915ee5a6-8dc9a914-a24f74a1.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/adf23b21-e747bc85-066fc17f-13ed77d3-76b78e47.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/adf23b21-e747bc85-066fc17f-13ed77d3-76b78e47.dcm
new file mode 100644
index 0000000..5ebfbef
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/adf23b21-e747bc85-066fc17f-13ed77d3-76b78e47.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/aee7fdc2-09fddea8-37521ad6-9358c2ba-c04710ad.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/aee7fdc2-09fddea8-37521ad6-9358c2ba-c04710ad.dcm
new file mode 100644
index 0000000..78ead5a
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/aee7fdc2-09fddea8-37521ad6-9358c2ba-c04710ad.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/b60973a6-d547c83e-10fa5ca5-09fa1d32-1cb7a916.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/b60973a6-d547c83e-10fa5ca5-09fa1d32-1cb7a916.dcm
new file mode 100644
index 0000000..883aee9
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/b60973a6-d547c83e-10fa5ca5-09fa1d32-1cb7a916.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/b75df1bd-0f22d631-52d73526-2ae7b85a-d843b39d.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/b75df1bd-0f22d631-52d73526-2ae7b85a-d843b39d.dcm
new file mode 100644
index 0000000..a5297c5
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/b75df1bd-0f22d631-52d73526-2ae7b85a-d843b39d.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/ba5012fe-09a19279-0e52db56-c4cdf7e7-b8c94e72.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/ba5012fe-09a19279-0e52db56-c4cdf7e7-b8c94e72.dcm
new file mode 100644
index 0000000..88c87e0
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/ba5012fe-09a19279-0e52db56-c4cdf7e7-b8c94e72.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/bee4a722-b652d820-a2a471fa-2da5198f-9d7ec629.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/bee4a722-b652d820-a2a471fa-2da5198f-9d7ec629.dcm
new file mode 100644
index 0000000..c30addc
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/bee4a722-b652d820-a2a471fa-2da5198f-9d7ec629.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/bfe3b2c4-a2f70d69-0be5a498-80f7a147-e89cd46b.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/bfe3b2c4-a2f70d69-0be5a498-80f7a147-e89cd46b.dcm
new file mode 100644
index 0000000..884ca76
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/bfe3b2c4-a2f70d69-0be5a498-80f7a147-e89cd46b.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/d0d2bd0c-8bc50aa2-a9ab3ca1-cf9c9404-543a10b7.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/d0d2bd0c-8bc50aa2-a9ab3ca1-cf9c9404-543a10b7.dcm
new file mode 100644
index 0000000..41d1328
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/d0d2bd0c-8bc50aa2-a9ab3ca1-cf9c9404-543a10b7.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/d18a800f-bf0cbd91-f0f86eaa-45efae2f-0019de87.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/d18a800f-bf0cbd91-f0f86eaa-45efae2f-0019de87.dcm
new file mode 100644
index 0000000..62cd1ad
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/d18a800f-bf0cbd91-f0f86eaa-45efae2f-0019de87.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/d758e92f-24e2f317-376bb959-6c95ff9e-12781712.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/d758e92f-24e2f317-376bb959-6c95ff9e-12781712.dcm
new file mode 100644
index 0000000..5059f07
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/d758e92f-24e2f317-376bb959-6c95ff9e-12781712.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/da7d254d-e57e0a26-f3e6f283-131a434d-4a40bd66.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/da7d254d-e57e0a26-f3e6f283-131a434d-4a40bd66.dcm
new file mode 100644
index 0000000..e1930a4
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/da7d254d-e57e0a26-f3e6f283-131a434d-4a40bd66.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/dc010015-308844f8-ee1961f5-2903ee69-da67bb76.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/dc010015-308844f8-ee1961f5-2903ee69-da67bb76.dcm
new file mode 100644
index 0000000..19e1c2d
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/dc010015-308844f8-ee1961f5-2903ee69-da67bb76.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/dc168d09-a2dba2eb-1e184507-01b975c9-a9ff417d.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/dc168d09-a2dba2eb-1e184507-01b975c9-a9ff417d.dcm
new file mode 100644
index 0000000..89bf62f
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/dc168d09-a2dba2eb-1e184507-01b975c9-a9ff417d.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/e0e1e00e-300cfe5a-69b07aa6-a60188d7-76871f68.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/e0e1e00e-300cfe5a-69b07aa6-a60188d7-76871f68.dcm
new file mode 100644
index 0000000..6489239
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/e0e1e00e-300cfe5a-69b07aa6-a60188d7-76871f68.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/ed4bcb75-91edbe53-7488d193-85649c23-076c7baa.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/ed4bcb75-91edbe53-7488d193-85649c23-076c7baa.dcm
new file mode 100644
index 0000000..14b3f88
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/ed4bcb75-91edbe53-7488d193-85649c23-076c7baa.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/ee2cbf1f-01ccf0d8-308689f1-c5560265-953342e0.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/ee2cbf1f-01ccf0d8-308689f1-c5560265-953342e0.dcm
new file mode 100644
index 0000000..fe1df0d
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/ee2cbf1f-01ccf0d8-308689f1-c5560265-953342e0.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/f2442773-a50575a2-63abf299-84c2924c-dbe89ead.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/f2442773-a50575a2-63abf299-84c2924c-dbe89ead.dcm
new file mode 100644
index 0000000..541af08
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/f2442773-a50575a2-63abf299-84c2924c-dbe89ead.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/f2788a2b-fb32877b-39273ddf-16c228f3-4a844084.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/f2788a2b-fb32877b-39273ddf-16c228f3-4a844084.dcm
new file mode 100644
index 0000000..2bb154e
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/f2788a2b-fb32877b-39273ddf-16c228f3-4a844084.dcm differ
diff --git a/sandbox/pneumonia-prediction/aidev-data/dicom/ff213473-b64efa18-863f2bad-76181481-30bc30d7.dcm b/sandbox/pneumonia-prediction/aidev-data/dicom/ff213473-b64efa18-863f2bad-76181481-30bc30d7.dcm
new file mode 100644
index 0000000..b3e863b
Binary files /dev/null and b/sandbox/pneumonia-prediction/aidev-data/dicom/ff213473-b64efa18-863f2bad-76181481-30bc30d7.dcm differ
diff --git a/sandbox/pneumonia-prediction/containers/data-prep/Dockerfile b/sandbox/pneumonia-prediction/containers/data-prep/Dockerfile
deleted file mode 100644
index f7b5258..0000000
--- a/sandbox/pneumonia-prediction/containers/data-prep/Dockerfile
+++ /dev/null
@@ -1,34 +0,0 @@
-FROM python:3.10.8-bullseye as wheelbuilder
-
-COPY requirements.txt ./
-
-RUN --mount=type=cache,target=/.cache/pip \
- python -m pip install -U pip setuptools wheel \
- && pip wheel --no-deps --cache-dir=/.cache/pip --wheel-dir /wheels -r requirements.txt
-
-
-FROM python:3.10.8-slim-bullseye
-
-COPY --from=wheelbuilder /wheels /wheels
-RUN python -m venv /venv \
- && . /venv/bin/activate \
- && python -m pip install --no-cache -U pip setuptools wheel \
- && pip install --no-cache /wheels/*
-
-ARG UID=5642
-ARG GID=5642
-
-RUN ( getent group $GID >/dev/null || groupadd -r -g $GID localgroup ) \
- && useradd -m -l -s /bin/bash -g $GID -N -u $UID localuser \
- && chown -R $UID:$GID /venv
-
-WORKDIR /home/localuser
-USER localuser
-
-COPY --chown=$UID:$GID dataprep_gc.py run_dataprep.sh ./
-
-# This is basically what venv/bin/activate does
-ENV PATH="/venv/bin:$PATH"
-ENV VIRTUAL_ENV="/venv"
-
-CMD ["./run_dataprep.sh", "/input", "/output", "/input/cohort_data.csv"]
diff --git a/sandbox/pneumonia-prediction/containers/data-prep/dataprep_gc.py b/sandbox/pneumonia-prediction/containers/data-prep/dataprep_gc.py
deleted file mode 100644
index 3b9cd62..0000000
--- a/sandbox/pneumonia-prediction/containers/data-prep/dataprep_gc.py
+++ /dev/null
@@ -1,49 +0,0 @@
-import pandas as pd
-import os
-import pydicom
-import numpy as np
-from PIL import Image
-from sklearn.impute import SimpleImputer
-import glob
-
-
-def convert_dcm_image_to_jpg(name):
- dcm = pydicom.dcmread(name)
- img = dcm.pixel_array.astype(float)
- rescaled_image = (np.maximum(img, 0) / img.max()) * 255 # float pixels
- final_image = np.uint8(rescaled_image) # integers pixels
- final_image = Image.fromarray(final_image)
- return final_image
-
-
-def cohort_dcm_to_jpg(df_cohort):
- input_dir = '/input/dicom_data/'
- output_dir = '/output/file_data/'
- dcm_list = glob.glob(input_dir + '/*/*.dcm')
-
- df_cohort['JPG_file'] = 'Nan'
- for dcm_file in dcm_list:
- image = convert_dcm_image_to_jpg(dcm_file)
- jpg_file_name = dcm_file.split('/')[-1].split('.dcm')[0] + '.jpg'
- ds = pydicom.dcmread(dcm_file)
- idx = df_cohort['Pneumonia'][df_cohort.SeriesUID == ds.SeriesInstanceUID].index[0]
- ground_truth = '1' if df_cohort.loc[idx, 'Pneumonia'] else '0'
- class_folder = output_dir + ground_truth
- if not os.path.exists(class_folder):
- os.makedirs(class_folder)
- image.save('/'.join([class_folder, jpg_file_name]))
- df_cohort.loc[idx, 'JPG file'] = '/'.join([ground_truth, jpg_file_name])
-
- return df_cohort
-
-
-if __name__ == '__main__':
- # Read cohort from /input
- df_cohort = pd.read_csv('/input/cohort_data.csv')
-
- # Convert DICOM to JPG
- df_cohort = cohort_dcm_to_jpg(df_cohort)
-
- # Write cohort to /output
- df_cohort.to_csv('/output/cohort_data.csv', index=False)
-
diff --git a/sandbox/pneumonia-prediction/containers/data-prep/requirements.txt b/sandbox/pneumonia-prediction/containers/data-prep/requirements.txt
deleted file mode 100644
index 8d3914a..0000000
--- a/sandbox/pneumonia-prediction/containers/data-prep/requirements.txt
+++ /dev/null
@@ -1,7 +0,0 @@
-pandas==1.3.4
-numpy==1.21.3
-sklearn==0.0
-sklearn-pandas==1.8.0
-scikit-learn==1.0.2
-pydicom==2.2.0
-Pillow==8.4.0
\ No newline at end of file
diff --git a/sandbox/pneumonia-prediction/containers/data-prep/run_dataprep.sh b/sandbox/pneumonia-prediction/containers/data-prep/run_dataprep.sh
deleted file mode 100755
index bfca65a..0000000
--- a/sandbox/pneumonia-prediction/containers/data-prep/run_dataprep.sh
+++ /dev/null
@@ -1,17 +0,0 @@
-#!/bin/bash
-
-if [ $# -ne 3 ]; then
- >&2 echo "Usage: $0 input_dir output_dir csv_file"
- exit 1
-fi
-
-INPUT_DIR=$1
-OUTPUT_DIR=$2
-CSV_FILE=$3
-
-DIR=$(dirname "$(readlink -f "$BASH_SOURCE")")
-
-set -x
-set -e
-
-python $DIR/dataprep_gc.py --input_csv ${CSV_FILE} --input_dir ${INPUT_DIR} --output_dir ${OUTPUT_DIR}
diff --git a/sandbox/pneumonia-prediction/containers/merge-cohorts/Dockerfile b/sandbox/pneumonia-prediction/containers/merge-cohorts/Dockerfile
deleted file mode 100644
index 3647ba5..0000000
--- a/sandbox/pneumonia-prediction/containers/merge-cohorts/Dockerfile
+++ /dev/null
@@ -1,36 +0,0 @@
-FROM python:3.10.8-slim-bullseye
-
-# Set up non-root user and group.
-ARG UID=5642
-ARG GID=5642
-RUN ( getent group $GID >/dev/null || groupadd -r -g $GID localgroup ) \
- && useradd -m -l -s /bin/bash -g $GID -N -u $UID localuser
-
-# Create and "activate" venv.
-ENV VIRTUAL_ENV="/venv"
-RUN mkdir $VIRTUAL_ENV \
- && chmod g+s $VIRTUAL_ENV \
- && chown $UID:$GID $VIRTUAL_ENV \
- && python -m venv $VIRTUAL_ENV
-ENV PATH="$VIRTUAL_ENV/bin:$PATH"
-
-# Install dependencies.
-COPY requirements.txt ./
-RUN --mount=type=cache,target=/root/.cache/pip \
- python -m pip install --upgrade pip setuptools wheel \
- && pip install -r requirements.txt \
- && rm requirements.txt
-
-WORKDIR /home/localuser
-USER localuser
-
-# Copy code.
-COPY --chown=$UID:$GID merge_cohorts_data.py ./
-
-ENV PYTHONUNBUFFERED=1
-
-CMD [ "python", "./merge_cohorts_data.py", \
- "--input_dir", "/input", \
- "--output_dir", "/output", \
- "--cohort_csv_file", "/input/cohort_data.csv" \
-]
diff --git a/sandbox/pneumonia-prediction/containers/merge-cohorts/README.md b/sandbox/pneumonia-prediction/containers/merge-cohorts/README.md
deleted file mode 100644
index c32878e..0000000
--- a/sandbox/pneumonia-prediction/containers/merge-cohorts/README.md
+++ /dev/null
@@ -1,24 +0,0 @@
-# Generalized Compute Example - Merge Cohorts Data
-
-
-### **Description**
-
-This example provides files that can be used with the Rhino Health Generalized Compute capability to remotely merge two input cohorts into a single output cohort on a Rhino Client.
-
-It shows how to:
-* Process multiple input CSV files
-* Merge the inputs into a single output CSV file
-* Use a single-step Dockerfile to build the container image (without using a separate step for installing requirements)
-
-Please reference the User Documentation and/or Tutorials for in depth explanations on how to use the Generalized Compute capability.
-
-
-### **Resources**
-- `Dockerfile` - This is the Dockerfile to be used for building the container image
-- `merge_cohorts_data.py` - This file contains the python code for merging the input cohorts
-- `requirements.in` - The input python requirements for this project
-- `requirements.txt` - The compiled python requirements for this project (using `pip-compile` on the requirements.in file)
-
-
-# Getting Help
-For additional support, please reach out to [support@rhinohealth.com](mailto:support@rhinohealth.com).
diff --git a/sandbox/pneumonia-prediction/containers/merge-cohorts/merge_cohorts_data.py b/sandbox/pneumonia-prediction/containers/merge-cohorts/merge_cohorts_data.py
deleted file mode 100755
index 5ea9a31..0000000
--- a/sandbox/pneumonia-prediction/containers/merge-cohorts/merge_cohorts_data.py
+++ /dev/null
@@ -1,8 +0,0 @@
-import pandas as pd
-
-if __name__ == "__main__":
- first = pd.read_csv("/input/0/cohort_data.csv")
- second = pd.read_csv("/input/1/cohort_data.csv")
- third = pd.read_csv("/input/2/cohort_data.csv")
- merged = pd.concat([first, second, third], ignore_index=True, axis=0)
- merged.to_csv("/output/0/cohort_data.csv", index=False)
diff --git a/sandbox/pneumonia-prediction/containers/merge-cohorts/requirements.in b/sandbox/pneumonia-prediction/containers/merge-cohorts/requirements.in
deleted file mode 100644
index fac8cba..0000000
--- a/sandbox/pneumonia-prediction/containers/merge-cohorts/requirements.in
+++ /dev/null
@@ -1 +0,0 @@
-pandas ~= 1.4.3
diff --git a/sandbox/pneumonia-prediction/containers/merge-cohorts/requirements.txt b/sandbox/pneumonia-prediction/containers/merge-cohorts/requirements.txt
deleted file mode 100644
index 5e8709e..0000000
--- a/sandbox/pneumonia-prediction/containers/merge-cohorts/requirements.txt
+++ /dev/null
@@ -1,65 +0,0 @@
-#
-# This file is autogenerated by pip-compile with python 3.10
-# To update, run:
-#
-# pip-compile --generate-hashes --output-file=requirements.txt requirements.in
-#
-numpy==1.23.0 \
- --hash=sha256:092f5e6025813e64ad6d1b52b519165d08c730d099c114a9247c9bb635a2a450 \
- --hash=sha256:196cd074c3f97c4121601790955f915187736f9cf458d3ee1f1b46aff2b1ade0 \
- --hash=sha256:1c29b44905af288b3919803aceb6ec7fec77406d8b08aaa2e8b9e63d0fe2f160 \
- --hash=sha256:2b2da66582f3a69c8ce25ed7921dcd8010d05e59ac8d89d126a299be60421171 \
- --hash=sha256:5043bcd71fcc458dfb8a0fc5509bbc979da0131b9d08e3d5f50fb0bbb36f169a \
- --hash=sha256:58bfd40eb478f54ff7a5710dd61c8097e169bc36cc68333d00a9bcd8def53b38 \
- --hash=sha256:79a506cacf2be3a74ead5467aee97b81fca00c9c4c8b3ba16dbab488cd99ba10 \
- --hash=sha256:94b170b4fa0168cd6be4becf37cb5b127bd12a795123984385b8cd4aca9857e5 \
- --hash=sha256:97a76604d9b0e79f59baeca16593c711fddb44936e40310f78bfef79ee9a835f \
- --hash=sha256:98e8e0d8d69ff4d3fa63e6c61e8cfe2d03c29b16b58dbef1f9baa175bbed7860 \
- --hash=sha256:ac86f407873b952679f5f9e6c0612687e51547af0e14ddea1eedfcb22466babd \
- --hash=sha256:ae8adff4172692ce56233db04b7ce5792186f179c415c37d539c25de7298d25d \
- --hash=sha256:bd3fa4fe2e38533d5336e1272fc4e765cabbbde144309ccee8675509d5cd7b05 \
- --hash=sha256:d0d2094e8f4d760500394d77b383a1b06d3663e8892cdf5df3c592f55f3bff66 \
- --hash=sha256:d54b3b828d618a19779a84c3ad952e96e2c2311b16384e973e671aa5be1f6187 \
- --hash=sha256:d6ca8dabe696c2785d0c8c9b0d8a9b6e5fdbe4f922bde70d57fa1a2848134f95 \
- --hash=sha256:d8cc87bed09de55477dba9da370c1679bd534df9baa171dd01accbb09687dac3 \
- --hash=sha256:f0f18804df7370571fb65db9b98bf1378172bd4e962482b857e612d1fec0f53e \
- --hash=sha256:f1d88ef79e0a7fa631bb2c3dda1ea46b32b1fe614e10fedd611d3d5398447f2f \
- --hash=sha256:f9c3fc2adf67762c9fe1849c859942d23f8d3e0bee7b5ed3d4a9c3eeb50a2f07 \
- --hash=sha256:fc431493df245f3c627c0c05c2bd134535e7929dbe2e602b80e42bf52ff760bc \
- --hash=sha256:fe8b9683eb26d2c4d5db32cd29b38fdcf8381324ab48313b5b69088e0e355379
- # via pandas
-pandas==1.4.3 \
- --hash=sha256:07238a58d7cbc8a004855ade7b75bbd22c0db4b0ffccc721556bab8a095515f6 \
- --hash=sha256:0daf876dba6c622154b2e6741f29e87161f844e64f84801554f879d27ba63c0d \
- --hash=sha256:16ad23db55efcc93fa878f7837267973b61ea85d244fc5ff0ccbcfa5638706c5 \
- --hash=sha256:1d9382f72a4f0e93909feece6fef5500e838ce1c355a581b3d8f259839f2ea76 \
- --hash=sha256:24ea75f47bbd5574675dae21d51779a4948715416413b30614c1e8b480909f81 \
- --hash=sha256:2893e923472a5e090c2d5e8db83e8f907364ec048572084c7d10ef93546be6d1 \
- --hash=sha256:2ff7788468e75917574f080cd4681b27e1a7bf36461fe968b49a87b5a54d007c \
- --hash=sha256:41fc406e374590a3d492325b889a2686b31e7a7780bec83db2512988550dadbf \
- --hash=sha256:48350592665ea3cbcd07efc8c12ff12d89be09cd47231c7925e3b8afada9d50d \
- --hash=sha256:605d572126eb4ab2eadf5c59d5d69f0608df2bf7bcad5c5880a47a20a0699e3e \
- --hash=sha256:6dfbf16b1ea4f4d0ee11084d9c026340514d1d30270eaa82a9f1297b6c8ecbf0 \
- --hash=sha256:6f803320c9da732cc79210d7e8cc5c8019aad512589c910c66529eb1b1818230 \
- --hash=sha256:721a3dd2f06ef942f83a819c0f3f6a648b2830b191a72bbe9451bcd49c3bd42e \
- --hash=sha256:755679c49460bd0d2f837ab99f0a26948e68fa0718b7e42afbabd074d945bf84 \
- --hash=sha256:78b00429161ccb0da252229bcda8010b445c4bf924e721265bec5a6e96a92e92 \
- --hash=sha256:958a0588149190c22cdebbc0797e01972950c927a11a900fe6c2296f207b1d6f \
- --hash=sha256:a3924692160e3d847e18702bb048dc38e0e13411d2b503fecb1adf0fcf950ba4 \
- --hash=sha256:d51674ed8e2551ef7773820ef5dab9322be0828629f2cbf8d1fc31a0c4fed640 \
- --hash=sha256:d5ebc990bd34f4ac3c73a2724c2dcc9ee7bf1ce6cf08e87bb25c6ad33507e318 \
- --hash=sha256:d6c0106415ff1a10c326c49bc5dd9ea8b9897a6ca0c8688eb9c30ddec49535ef \
- --hash=sha256:e48fbb64165cda451c06a0f9e4c7a16b534fcabd32546d531b3c240ce2844112
- # via -r requirements.in
-python-dateutil==2.8.2 \
- --hash=sha256:0123cacc1627ae19ddf3c27a5de5bd67ee4586fbdd6440d9748f8abb483d3e86 \
- --hash=sha256:961d03dc3453ebbc59dbdea9e4e11c5651520a876d0f4db161e8674aae935da9
- # via pandas
-pytz==2022.1 \
- --hash=sha256:1e760e2fe6a8163bc0b3d9a19c4f84342afa0a2affebfaa84b01b978a02ecaa7 \
- --hash=sha256:e68985985296d9a66a881eb3193b0906246245294a881e7c8afe623866ac6a5c
- # via pandas
-six==1.16.0 \
- --hash=sha256:1e61c37477a1626458e36f7b1d82aa5c9b094fa4802892072e49de9c60c4c926 \
- --hash=sha256:8abb2f1d86890a2dfb989f9a77cfcfd3e47c2a354b01111771326f8aa26e0254
- # via python-dateutil
diff --git a/sandbox/pneumonia-prediction/containers/prediction-model/Dockerfile b/sandbox/pneumonia-prediction/containers/prediction-model/Dockerfile
deleted file mode 100644
index 48ca546..0000000
--- a/sandbox/pneumonia-prediction/containers/prediction-model/Dockerfile
+++ /dev/null
@@ -1,51 +0,0 @@
-# !! EDIT THIS: Set base Docker image to be used.
-FROM nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu20.04
-
-# Set env vars to be able to run apt-get commands without issues.
-ARG LC_ALL="C.UTF-8"
-ARG TZ=Etc/UTC
-ENV DEBIAN_FRONTEND=noninteractive
-
-# Install Python 3.8
-RUN --mount=type=cache,id=apt,target=/var/cache/apt \
- rm -f /etc/apt/apt.conf.d/docker-clean \
- && apt-get update \
- && apt-get install -y -q software-properties-common \
- && add-apt-repository ppa:deadsnakes/ppa \
- && apt-get update \
- && apt-get install -y -q python3.8 python3.8-dev python3.8-venv \
- && apt-get remove -y --autoremove software-properties-common \
- && rm -rf /var/lib/apt/lists/*
-
-# Set up non-root user and group.
-ARG UID=5642
-ARG GID=5642
-RUN ( getent group $GID >/dev/null || groupadd -r -g $GID localgroup ) \
- && useradd -m -l -s /bin/bash -g $GID -N -u $UID localuser
-
-# Create and "activate" venv.
-ENV VIRTUAL_ENV="/venv"
-RUN mkdir $VIRTUAL_ENV \
- && chmod g+s $VIRTUAL_ENV \
- && chown $UID:$GID $VIRTUAL_ENV \
- && python3.8 -m venv $VIRTUAL_ENV
-ENV PATH="$VIRTUAL_ENV/bin:$PATH"
-
-# Install dependencies.
-COPY requirements.txt ./
-RUN --mount=type=cache,target=/root/.cache/pip \
- python -m pip install --upgrade pip setuptools wheel \
- && pip install -r requirements.txt \
- && rm requirements.txt
-
-WORKDIR /home/localuser
-USER localuser
-
-# !! EDIT THIS: Copy the needed local files (code and otherwise) into the container work directory.
-# Note: For directories, have a separate COPY command for each top-level directory.
-COPY --chown=$UID:$GID ./config ./config
-COPY --chown=$UID:$GID ./custom ./custom
-COPY --chown=$UID:$GID ./infer.py ./infer.py
-
-ENV PYTHONPATH="/home/localuser/custom"
-ENV PYTHONUNBUFFERED=1
diff --git a/sandbox/pneumonia-prediction/containers/prediction-model/README.md b/sandbox/pneumonia-prediction/containers/prediction-model/README.md
deleted file mode 100644
index 7a0ac03..0000000
--- a/sandbox/pneumonia-prediction/containers/prediction-model/README.md
+++ /dev/null
@@ -1,27 +0,0 @@
-# NVIDIA FLARE Example - MIMIC CXR for NVIDIA FLARE 2.2
-
-
-### **Description**
-
-This example contains files to train a pneumonia detection model from Chest XRays using Rhino Health's Federated Computing Platform (FCP) and NVIDIA FLARE v2.2
-
-It shows how to:
-* Use PyTorch model code adapted to NVIDIA FLARE (NVFlare) v2.2, and apply the necessary changes for it to run on FCP
-* Add an `infer.py` script to perform inference on the trained model
-* Package the code in a Docker container that can be used with FCP
-
-Please reference the User Documentation and/or Tutorials for in depth explanations on how to use NVFlare on FCP
-
-
-### **Resources**
-- `config` - This is the standard NVFlare directory for config files
- - `config_fed_client.json` - The standard NVFlare federated client config, setting to 1 epoch for the example
- - `config_fed_server.json` - The standard NVFlare federated server config, setting the output model weights file to be stored in `/output/model_parameters.pt`
-- `custom` - This is the standard NVFlare directory for custom model code, containing the code for the pneumonia model (reading the input data from the `/input` folder in order to work with FCP)
-- `infer.py` - A script for running inference on the trained model
-- `Dockerfile` - This is the Dockerfile to be used for building the container image
-- `requirements.txt` - The python requirements for this project
-
-
-# Getting Help
-For additional support, please reach out to [support@rhinohealth.com](mailto:support@rhinohealth.com).
diff --git a/sandbox/pneumonia-prediction/containers/prediction-model/config/config_fed_client.json b/sandbox/pneumonia-prediction/containers/prediction-model/config/config_fed_client.json
deleted file mode 100644
index 9bd8a12..0000000
--- a/sandbox/pneumonia-prediction/containers/prediction-model/config/config_fed_client.json
+++ /dev/null
@@ -1,22 +0,0 @@
-{
- "format_version": 2,
-
- "executors": [
- {
- "tasks": ["train", "submit_model"],
- "executor": {
- "path": "pneumonia_trainer.PneumoniaTrainer",
- "args": {
- "lr": 0.01,
- "epochs": 1
- }
- }
- }
- ],
- "task_result_filters": [
- ],
- "task_data_filters": [
- ],
- "components": [
- ]
-}
diff --git a/sandbox/pneumonia-prediction/containers/prediction-model/config/config_fed_server.json b/sandbox/pneumonia-prediction/containers/prediction-model/config/config_fed_server.json
deleted file mode 100644
index 9539f4e..0000000
--- a/sandbox/pneumonia-prediction/containers/prediction-model/config/config_fed_server.json
+++ /dev/null
@@ -1,63 +0,0 @@
-{
- "format_version": 2,
-
- "server": {
- "heart_beat_timeout": 600
- },
- "task_data_filters": [],
- "task_result_filters": [],
- "components": [
- {
- "id": "persistor",
- "name": "PTFileModelPersistor",
- "args": {
- "model": {
- "path": "network.PneumoniaModel"
- },
- "global_model_file_name": "/output/model_parameters.pt"
- }
- },
- {
- "id": "shareable_generator",
- "path": "nvflare.app_common.shareablegenerators.full_model_shareable_generator.FullModelShareableGenerator",
- "args": {}
- },
- {
- "id": "aggregator",
- "path": "nvflare.app_common.aggregators.intime_accumulate_model_aggregator.InTimeAccumulateWeightedAggregator",
- "args": {
- "expected_data_kind": "WEIGHTS"
- }
- },
- {
- "id": "model_locator",
- "path": "pt_model_locator.PTModelLocator",
- "args": {
-
- }
- },
- {
- "id": "json_generator",
- "path": "validation_json_generator.ValidationJsonGenerator",
- "args": {
- }
- }
- ],
- "workflows": [
- {
- "id": "scatter_and_gather",
- "name": "ScatterAndGather",
- "args": {
- "min_clients": 1,
- "num_rounds": 1,
- "start_round": 0,
- "wait_time_after_min_received": 10,
- "aggregator_id": "aggregator",
- "persistor_id": "persistor",
- "shareable_generator_id": "shareable_generator",
- "train_task_name": "train",
- "train_timeout": 0
- }
- }
- ]
-}
diff --git a/sandbox/pneumonia-prediction/containers/prediction-model/custom/network.py b/sandbox/pneumonia-prediction/containers/prediction-model/custom/network.py
deleted file mode 100644
index a1fbf15..0000000
--- a/sandbox/pneumonia-prediction/containers/prediction-model/custom/network.py
+++ /dev/null
@@ -1,47 +0,0 @@
-# Copyright (c) 2021, NVIDIA CORPORATION.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import torch.nn as nn
-
-
-class PneumoniaModel(nn.Module):
- def __init__(self, num_classes=2):
- super().__init__()
-
- self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=3, stride=1, padding=1)
-
- self.bn1 = nn.BatchNorm2d(num_features=12)
- self.relu1 = nn.ReLU()
- self.pool = nn.MaxPool2d(kernel_size=2)
- self.conv2 = nn.Conv2d(in_channels=12, out_channels=20, kernel_size=3, stride=1, padding=1)
- self.relu2 = nn.ReLU()
- self.conv3 = nn.Conv2d(in_channels=20, out_channels=32, kernel_size=3, stride=1, padding=1)
- self.bn3 = nn.BatchNorm2d(num_features=32)
- self.relu3 = nn.ReLU()
- self.fc = nn.Linear(in_features=32 * 112 * 112, out_features=num_classes)
-
- def forward(self, input):
- output = self.conv1(input)
- output = self.bn1(output)
- output = self.relu1(output)
- output = self.pool(output)
- output = self.conv2(output)
- output = self.relu2(output)
- output = self.conv3(output)
- output = self.bn3(output)
- output = self.relu3(output)
- output = output.view(-1, 32 * 112 * 112)
- output = self.fc(output)
-
- return output
diff --git a/sandbox/pneumonia-prediction/containers/prediction-model/custom/pneumonia_trainer.py b/sandbox/pneumonia-prediction/containers/prediction-model/custom/pneumonia_trainer.py
deleted file mode 100644
index 9f4a73a..0000000
--- a/sandbox/pneumonia-prediction/containers/prediction-model/custom/pneumonia_trainer.py
+++ /dev/null
@@ -1,189 +0,0 @@
-# Copyright (c) 2023, Rhino HealthTech, Inc.
-# Original file modified by Rhino Health to adapt it to the Rhino Health Federated Computing Platform.
-
-# Copyright (c) 2021, NVIDIA CORPORATION.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import os.path
-import os
-
-import torch
-from torch import nn
-from torch.optim import Adam
-from torch.utils.data.dataloader import DataLoader
-from torchvision.transforms import ToTensor, Normalize, Compose, Resize, RandomRotation, CenterCrop
-import torchvision
-
-from nvflare.apis.dxo import from_shareable, DXO, DataKind, MetaKey
-from nvflare.apis.executor import Executor
-from nvflare.apis.fl_constant import ReturnCode, ReservedKey
-from nvflare.apis.fl_context import FLContext
-from nvflare.apis.shareable import Shareable, make_reply
-from nvflare.apis.signal import Signal
-from nvflare.app_common.abstract.model import make_model_learnable, model_learnable_to_dxo
-from nvflare.app_common.app_constant import AppConstants
-from nvflare.app_common.pt.pt_fed_utils import PTModelPersistenceFormatManager
-from pt_constants import PTConstants
-from network import PneumoniaModel
-
-
-class PneumoniaTrainer(Executor):
-
- def __init__(self, lr=0.01, epochs=5, train_task_name=AppConstants.TASK_TRAIN,
- submit_model_task_name=AppConstants.TASK_SUBMIT_MODEL, exclude_vars=None):
-
- """
- Args:
- lr (float, optional): Learning rate. Defaults to 0.01
- epochs (int, optional): Epochs. Defaults to 5
- train_task_name (str, optional): Task name for train task. Defaults to "train".
- submit_model_task_name (str, optional): Task name for submit model. Defaults to "submit_model".
- exclude_vars (list): List of variables to exclude during model loading.
- """
- super().__init__()
-
- self._lr = lr
- self._epochs = epochs
- self._train_task_name = train_task_name
- self._submit_model_task_name = submit_model_task_name
- self._exclude_vars = exclude_vars
-
- # Training setup
- self.model = PneumoniaModel()
- self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
- self.model.to(self.device)
- self.loss = nn.CrossEntropyLoss()
- self.optimizer = Adam(self.model.parameters(), lr=lr)
-
- # Create mimic-cxr dataset for training.
- transforms = Compose([
- Resize(size=(256, 256)),
- RandomRotation(degrees=(-20, +20)),
- CenterCrop(size=224),
- ToTensor(),
- Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
- ])
-
- cohort_uid = next(os.walk('/input/cohorts'))[1][0]
- self._train_dataset = torchvision.datasets.ImageFolder(root='/input/cohorts/'+cohort_uid+'/file_data',
- transform=transforms)
-
- self._train_loader = DataLoader(self._train_dataset, batch_size=4, shuffle=True)
-
- self._n_iterations = len(self._train_loader)
-
- # Setup the persistence manager to save PT model.
- # The default training configuration is used by persistence manager
- # in case no initial model is found.
- self._default_train_conf = {"train": {"model": type(self.model).__name__}}
- self.persistence_manager = PTModelPersistenceFormatManager(
- data=self.model.state_dict(), default_train_conf=self._default_train_conf)
-
- def local_train(self, fl_ctx, weights, abort_signal):
- # Set the model weights
- self.model.load_state_dict(state_dict=weights)
-
- # Basic training
- self.model.train()
- for epoch in range(self._epochs):
- running_loss = 0.0
- for i, batch in enumerate(self._train_loader):
- if abort_signal.triggered:
- # If abort_signal is triggered, we simply return.
- # The outside function will check it again and decide steps to take.
- return
-
- images, labels = batch[0].to(self.device), batch[1].to(self.device)
- self.optimizer.zero_grad()
-
- predictions = self.model(images)
- cost = self.loss(predictions, labels)
- cost.backward()
- self.optimizer.step()
-
- running_loss += (cost.cpu().detach().numpy()/images.size()[0])
- if i % 3000 == 0:
- self.log_info(fl_ctx, f"Epoch: {epoch}/{self._epochs}, Iteration: {i}, "
- f"Loss: {running_loss/3000}")
- running_loss = 0.0
-
- def execute(self, task_name: str, shareable: Shareable, fl_ctx: FLContext, abort_signal: Signal) -> Shareable:
- try:
- if task_name == self._train_task_name:
- # Get model weights
- try:
- dxo = from_shareable(shareable)
- except Exception:
- self.log_error(fl_ctx, "Unable to extract dxo from shareable.")
- return make_reply(ReturnCode.BAD_TASK_DATA)
-
- # Ensure data_files kind is weights.
- if not dxo.data_kind == DataKind.WEIGHTS:
- self.log_error(fl_ctx, f"data_kind expected WEIGHTS but got {dxo.data_kind} instead.")
- return make_reply(ReturnCode.BAD_TASK_DATA)
-
- # Convert weights to tensor. Run training
- torch_weights = {k: torch.as_tensor(v) for k, v in dxo.data.items()}
- self.local_train(fl_ctx, torch_weights, abort_signal)
-
- # Check the abort_signal after training.
- # local_train returns early if abort_signal is triggered.
- if abort_signal.triggered:
- return make_reply(ReturnCode.TASK_ABORTED)
-
- # Save the local model after training.
- self.save_local_model(fl_ctx)
-
- # Get the new state dict and send as weights
- new_weights = self.model.state_dict()
- new_weights = {k: v.cpu().numpy() for k, v in new_weights.items()}
-
- outgoing_dxo = DXO(data_kind=DataKind.WEIGHTS, data=new_weights,
- meta={MetaKey.NUM_STEPS_CURRENT_ROUND: self._n_iterations})
- return outgoing_dxo.to_shareable()
- elif task_name == self._submit_model_task_name:
- # Load local model
- ml = self.load_local_model(fl_ctx)
-
- # Get the model parameters and create dxo from it
- dxo = model_learnable_to_dxo(ml)
- return dxo.to_shareable()
- else:
- return make_reply(ReturnCode.TASK_UNKNOWN)
- except Exception:
- self.log_exception(fl_ctx, f"Exception in simple trainer.")
- return make_reply(ReturnCode.EXECUTION_EXCEPTION)
-
- def save_local_model(self, fl_ctx: FLContext):
- run_dir = fl_ctx.get_engine().get_workspace().get_run_dir(fl_ctx.get_prop(ReservedKey.RUN_NUM))
- models_dir = os.path.join(run_dir, PTConstants.PTModelsDir)
- if not os.path.exists(models_dir):
- os.makedirs(models_dir)
- model_path = os.path.join(models_dir, PTConstants.PTLocalModelName)
-
- ml = make_model_learnable(self.model.state_dict(), {})
- self.persistence_manager.update(ml)
- torch.save(self.persistence_manager.to_persistence_dict(), model_path)
-
- def load_local_model(self, fl_ctx: FLContext):
- run_dir = fl_ctx.get_engine().get_workspace().get_run_dir(fl_ctx.get_prop(ReservedKey.RUN_NUM))
- models_dir = os.path.join(run_dir, PTConstants.PTModelsDir)
- if not os.path.exists(models_dir):
- return None
- model_path = os.path.join(models_dir, PTConstants.PTLocalModelName)
-
- self.persistence_manager = PTModelPersistenceFormatManager(data=torch.load(model_path),
- default_train_conf=self._default_train_conf)
- ml = self.persistence_manager.to_model_learnable(exclude_vars=self._exclude_vars)
- return ml
diff --git a/sandbox/pneumonia-prediction/containers/prediction-model/custom/pt_constants.py b/sandbox/pneumonia-prediction/containers/prediction-model/custom/pt_constants.py
deleted file mode 100644
index 0b1c46f..0000000
--- a/sandbox/pneumonia-prediction/containers/prediction-model/custom/pt_constants.py
+++ /dev/null
@@ -1,21 +0,0 @@
-# Copyright (c) 2021, NVIDIA CORPORATION.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-class PTConstants:
- PTServerName = "server"
- PTFileModelName = "model_parameters.pt"
- PTLocalModelName = "local_model.pt"
-
- PTModelsDir = "models"
- CrossValResultsJsonFilename = "cross_val_results.json"
diff --git a/sandbox/pneumonia-prediction/containers/prediction-model/custom/pt_model_locator.py b/sandbox/pneumonia-prediction/containers/prediction-model/custom/pt_model_locator.py
deleted file mode 100644
index 54610cc..0000000
--- a/sandbox/pneumonia-prediction/containers/prediction-model/custom/pt_model_locator.py
+++ /dev/null
@@ -1,69 +0,0 @@
-# Copyright (c) 2021, NVIDIA CORPORATION.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import os
-from typing import List, Union
-
-import torch.cuda
-
-from nvflare.apis.dxo import DXO
-from nvflare.apis.fl_context import FLContext
-from nvflare.app_common.abstract.model import model_learnable_to_dxo
-from nvflare.app_common.abstract.model_locator import ModelLocator
-from nvflare.app_common.pt.pt_fed_utils import PTModelPersistenceFormatManager
-from pt_constants import PTConstants
-from network import PneumoniaModel
-
-
-class PTModelLocator(ModelLocator):
-
- def __init__(self, exclude_vars=None, model=None):
- super(PTModelLocator, self).__init__()
-
- self.model = PneumoniaModel()
- self.exclude_vars = exclude_vars
-
- def get_model_names(self, fl_ctx: FLContext) -> List[str]:
- return [PTConstants.PTServerName]
-
- def locate_model(self, model_name, fl_ctx: FLContext) -> Union[DXO, None]:
- if model_name == PTConstants.PTServerName:
- try:
- server_run_dir = fl_ctx.get_engine().get_workspace().get_app_dir(fl_ctx.get_run_number())
- model_path = os.path.join(server_run_dir, PTConstants.PTFileModelName)
- if not os.path.exists(model_path):
- return None
-
- # Load the torch model
- device = "cuda" if torch.cuda.is_available() else "cpu"
- data = torch.load(model_path, map_location=device)
-
- # Setup the persistence manager.
- if self.model:
- default_train_conf = {"train": {"model": type(self.model).__name__}}
- else:
- default_train_conf = None
-
- # Use persistence manager to get learnable
- persistence_manager = PTModelPersistenceFormatManager(data, default_train_conf=default_train_conf)
- ml = persistence_manager.to_model_learnable(exclude_vars=None)
-
- # Create dxo and return
- return model_learnable_to_dxo(ml)
- except:
- self.log_error(fl_ctx, "Error in retrieving {model_name}.", fire_event=False)
- return None
- else:
- self.log_error(fl_ctx, f"PTModelLocator doesn't recognize name: {model_name}", fire_event=False)
- return None
diff --git a/sandbox/pneumonia-prediction/containers/prediction-model/custom/validation_json_generator.py b/sandbox/pneumonia-prediction/containers/prediction-model/custom/validation_json_generator.py
deleted file mode 100644
index cfd5aeb..0000000
--- a/sandbox/pneumonia-prediction/containers/prediction-model/custom/validation_json_generator.py
+++ /dev/null
@@ -1,83 +0,0 @@
-# Copyright (c) 2021, NVIDIA CORPORATION.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import json
-import os.path
-
-from nvflare.apis.dxo import DataKind, from_shareable
-from nvflare.apis.event_type import EventType
-from nvflare.apis.fl_context import FLContext
-from nvflare.app_common.app_constant import AppConstants
-from nvflare.app_common.app_event_type import AppEventType
-from nvflare.widgets.widget import Widget
-
-
-class ValidationJsonGenerator(Widget):
- def __init__(self, results_dir=AppConstants.CROSS_VAL_DIR, json_file_name="cross_val_results.json"):
- """Catches VALIDATION_RESULT_RECEIVED event and generates a results.json containing accuracy of each
- validated model.
-
- Args:
- results_dir (str, optional): Name of the results directory. Defaults to cross_site_val
- json_file_name (str, optional): Name of the json file. Defaults to cross_val_results.json
- """
- super(ValidationJsonGenerator, self).__init__()
-
- self._results_dir = results_dir
- self._val_results = {}
- self._json_file_name = json_file_name
-
- def handle_event(self, event_type: str, fl_ctx: FLContext):
- if event_type == EventType.START_RUN:
- self._val_results.clear()
- elif event_type == AppEventType.VALIDATION_RESULT_RECEIVED:
- model_owner = fl_ctx.get_prop(AppConstants.MODEL_OWNER, None)
- data_client = fl_ctx.get_prop(AppConstants.DATA_CLIENT, None)
- val_results = fl_ctx.get_prop(AppConstants.VALIDATION_RESULT, None)
-
- if not model_owner:
- self.log_error(
- fl_ctx, "model_owner unknown. Validation result will not be saved to json", fire_event=False
- )
- if not data_client:
- self.log_error(
- fl_ctx, "data_client unknown. Validation result will not be saved to json", fire_event=False
- )
-
- if val_results:
- try:
- dxo = from_shareable(val_results)
- dxo.validate()
-
- if dxo.data_kind == DataKind.METRICS:
- if data_client not in self._val_results:
- self._val_results[data_client] = {}
- self._val_results[data_client][model_owner] = dxo.data
- else:
- self.log_error(
- fl_ctx, f"Expected dxo of kind METRICS but got {dxo.data_kind} instead.", fire_event=False
- )
- except Exception:
- self.log_exception(fl_ctx, "Exception in handling validation result.", fire_event=False)
- else:
- self.log_error(fl_ctx, "Validation result not found.", fire_event=False)
- elif event_type == EventType.END_RUN:
- run_dir = fl_ctx.get_engine().get_workspace().get_run_dir(fl_ctx.get_job_id())
- cross_val_res_dir = os.path.join(run_dir, self._results_dir)
- if not os.path.exists(cross_val_res_dir):
- os.makedirs(cross_val_res_dir)
-
- res_file_path = os.path.join(cross_val_res_dir, self._json_file_name)
- with open(res_file_path, "w") as f:
- json.dump(self._val_results, f)
diff --git a/sandbox/pneumonia-prediction/containers/prediction-model/infer.py b/sandbox/pneumonia-prediction/containers/prediction-model/infer.py
deleted file mode 100644
index 440e0e0..0000000
--- a/sandbox/pneumonia-prediction/containers/prediction-model/infer.py
+++ /dev/null
@@ -1,50 +0,0 @@
-#!/usr/bin/env python
-import sys
-
-import pandas as pd
-import torch
-import torchvision
-from torch.utils.data.dataloader import DataLoader
-from torchvision.transforms import ToTensor, Normalize, Compose, Resize, RandomRotation, CenterCrop
-
-from network import PneumoniaModel
-
-
-def infer(model_weights_file_path):
- # Setup the model
- model = PneumoniaModel()
- model.load_state_dict(torch.load(model_weights_file_path)["model"])
- model.eval()
- device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
- model.to(device)
-
- # Preparing the dataset for testing.
- transforms = Compose([
- Resize(size=(256, 256)),
- RandomRotation(degrees=(-20, +20)),
- CenterCrop(size=224),
- ToTensor(),
- Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
- ])
- tabular_data = pd.read_csv("/input/cohort_data.csv")
- dataset = torchvision.datasets.ImageFolder(root="/input/file_data", transform=transforms)
- loader = DataLoader(dataset, batch_size=4, shuffle=False)
-
- # Inference: Apply model and add scores column.
- scores = []
- with torch.no_grad():
- for i, (images, labels) in enumerate(loader):
- images = images.to(device)
- output = model(images)
- batch_scores = torch.select(output, 1, 1)
- scores.extend([score.item() for score in batch_scores])
- tabular_data['Model_Score'] = scores
-
- tabular_data.to_csv("/output/cohort_data.csv", index=False)
-
-
-if __name__ == "__main__":
- args = sys.argv[1:]
- (model_weights_file_path,) = args
- infer(model_weights_file_path)
- sys.exit(0)
diff --git a/sandbox/pneumonia-prediction/containers/prediction-model/requirements.txt b/sandbox/pneumonia-prediction/containers/prediction-model/requirements.txt
deleted file mode 100644
index f11fd31..0000000
--- a/sandbox/pneumonia-prediction/containers/prediction-model/requirements.txt
+++ /dev/null
@@ -1,4 +0,0 @@
-nvflare==2.2.3
-pandas==1.4.1
-torch>=1.10,<1.11
-torchvision>=0.11.1,<0.12
diff --git a/sandbox/pneumonia-prediction/containers/prediction-model/source_centralized_model/README.md b/sandbox/pneumonia-prediction/containers/prediction-model/source_centralized_model/README.md
deleted file mode 100644
index e333f28..0000000
--- a/sandbox/pneumonia-prediction/containers/prediction-model/source_centralized_model/README.md
+++ /dev/null
@@ -1,38 +0,0 @@
-**How to Transform a Centralized Training Model to NVFLARE**
-
-* Begin with a centralized model, for example: https://www.kaggle.com/code/fahadmehfoooz/pneumonia-classification-using-pytorch/notebook
-* Follow these steps:
- 1. Custom folder:
- Create your network.py, trainer.py, and copy relevant functions to your trainer.py
- 2. Config folder:
- Create config_fed_client.json and config_fed_server.json
- 3. infer.py (optional)
- Create your model validation code (to be run using Generalized Compute)
- 4. Try everything out with the docker_run.sh utility
- 5. Push your container to your ECR repo with the docker_push.sh utility
- 6. Run training and validation via the Rhino Health Platform in the UI or SDK
-* Example: Pneumonia Classification Model:
- * Source: source_centralized_model/pneumonia_classification.py
- * Converted to: custom/[network.py + pneumonia_trainer.py (+pt_constants.py)] + infer.py
-
- **Training**
-
- |Step| Source | NVFLARE|
- |---|:---:|---:|
- |data transform|lines 9-30|pneumonia_trainer.py: lines 66-73|
- |load data|lines 33-52|pneumonia_trainer.py: lines 75-82|
- |define model|lines 54-85|network.py|
- |model params|lines 91-99|pneumonia_trainer.py: lines 59-64|
- |training|lines 101-128|pneumonia_trainer.py: lines 90-116|
- |NVFLARE wrapper|___|pneumonia_trainer.py: lines 118-186|
-
- **Inference**
-
- |Step| Source | NVFLARE|
- |---|:---:|---:|
- |data transform|lines 9-30|infer.py: lines 24-30|
- |load data|lines 33-52|infer.py: lines 31-33|
- |define model|lines 54-85|network.py|
- |infer|lines 130-146|infer.py: lines 35-43|
- |write cohort|___|infer.py: lines 47-55|
-
diff --git a/sandbox/pneumonia-prediction/containers/prediction-model/source_centralized_model/pneumonia_classification.py b/sandbox/pneumonia-prediction/containers/prediction-model/source_centralized_model/pneumonia_classification.py
deleted file mode 100644
index 11b3ea9..0000000
--- a/sandbox/pneumonia-prediction/containers/prediction-model/source_centralized_model/pneumonia_classification.py
+++ /dev/null
@@ -1,149 +0,0 @@
-import torch
-import os
-from torchvision import transforms as T, datasets, models
-from torch.utils.data import DataLoader
-from torch import nn, optim
-from torch.autograd import Variable
-
-
-def data_transforms(phase=None):
- if phase == TRAIN:
-
- data_T = T.Compose([
-
- T.Resize(size=(256, 256)),
- T.RandomRotation(degrees=(-20, +20)),
- T.CenterCrop(size=224),
- T.ToTensor(),
- T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
- ])
-
- elif phase == TEST:
-
- data_T = T.Compose([
-
- T.Resize(size=(224, 224)),
- T.ToTensor(),
- T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
- ])
-
- return data_T
-
-
-data_dir = "../input/chest-xray-pneumonia/chest_xray/chest_xray"
-TEST = 'test'
-TRAIN = 'train'
-
-trainset = datasets.ImageFolder(os.path.join(data_dir, TRAIN), transform=data_transforms(TRAIN))
-testset = datasets.ImageFolder(os.path.join(data_dir, TEST), transform=data_transforms(TEST))
-
-class_names = trainset.classes
-print(class_names)
-print(trainset.class_to_idx)
-
-trainloader = DataLoader(trainset, batch_size=64, shuffle=True)
-testloader = DataLoader(testset, batch_size=64, shuffle=True)
-
-images, labels = iter(trainloader).next()
-
-for i, (images, labels) in enumerate(trainloader):
- if torch.cuda.is_available():
- images = Variable(images.cuda())
- labels = Variable(labels.cuda())
-
-class classify(nn.Module):
- def __init__(self, num_classes=2):
- super(classify, self).__init__()
-
- self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=3, stride=1, padding=1)
-
- self.bn1 = nn.BatchNorm2d(num_features=12)
- self.relu1 = nn.ReLU()
- self.pool = nn.MaxPool2d(kernel_size=2)
- self.conv2 = nn.Conv2d(in_channels=12, out_channels=20, kernel_size=3, stride=1, padding=1)
- self.relu2 = nn.ReLU()
- self.conv3 = nn.Conv2d(in_channels=20, out_channels=32, kernel_size=3, stride=1, padding=1)
- self.bn3 = nn.BatchNorm2d(num_features=32)
- self.relu3 = nn.ReLU()
- self.fc = nn.Linear(in_features=32 * 112 * 112, out_features=num_classes)
-
- # Feed forward function
-
- def forward(self, input):
- output = self.conv1(input)
- output = self.bn1(output)
- output = self.relu1(output)
- output = self.pool(output)
- output = self.conv2(output)
- output = self.relu2(output)
- output = self.conv3(output)
- output = self.bn3(output)
- output = self.relu3(output)
- output = output.view(-1, 32 * 112 * 112)
- output = self.fc(output)
-
- return output
-
-
-device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-
-
-model = classify()
-# defining the optimizer
-optimizer = optim.Adam(model.parameters(), lr=0.01)
-# defining the loss function
-criterion = nn.CrossEntropyLoss()
-# checking if GPU is available
-if torch.cuda.is_available():
- model = model.cuda()
- criterion = criterion.cuda()
-
-Losses = []
-for i in range(4):
- running_loss = 0
- for images, labels in trainloader:
-
- # Changing images to cuda for gpu
- if torch.cuda.is_available():
- images = images.cuda()
- labels = labels.cuda()
-
- # Training pass
- # Sets the gradient to zero
- optimizer.zero_grad()
-
- output = model(images)
- loss = criterion(output, labels)
-
- # This is where the model learns by backpropagating
- # accumulates the loss for mini batch
- loss.backward()
-
- # And optimizes its weights here
- optimizer.step()
- Losses.append(loss)
-
- running_loss += loss.item()
- else:
- print("Epoch {} - Training loss: {}".format(i + 1, running_loss / len(trainloader)))
-
-correct_count, all_count = 0, 0
-for images, labels in testloader:
- for i in range(len(labels)):
- if torch.cuda.is_available():
- images = images.cuda()
- labels = labels.cuda()
- img = images[i].view(1, 3, 224, 224)
- with torch.no_grad():
- logps = model(img)
-
- ps = torch.exp(logps)
- probab = list(ps.cpu()[0])
- pred_label = probab.index(max(probab))
- true_label = labels.cpu()[i]
- if (true_label == pred_label):
- correct_count += 1
- all_count += 1
-
-print("Number Of Images Tested =", all_count)
-print("\nModel Accuracy =", (correct_count / all_count))
diff --git a/sandbox/pneumonia-prediction/containers/train-test-split/Dockerfile b/sandbox/pneumonia-prediction/containers/train-test-split/Dockerfile
deleted file mode 100644
index cf6f03e..0000000
--- a/sandbox/pneumonia-prediction/containers/train-test-split/Dockerfile
+++ /dev/null
@@ -1,34 +0,0 @@
-FROM python:3.10.8-bullseye as wheelbuilder
-
-COPY requirements.txt ./
-
-RUN --mount=type=cache,target=/.cache/pip \
- python -m pip install -U pip setuptools wheel \
- && pip wheel --no-deps --cache-dir=/.cache/pip --wheel-dir /wheels -r requirements.txt
-
-
-FROM python:3.10.8-slim-bullseye
-
-COPY --from=wheelbuilder /wheels /wheels
-RUN python -m venv /venv \
- && . /venv/bin/activate \
- && python -m pip install --no-cache -U pip setuptools wheel \
- && pip install --no-cache /wheels/*
-
-ARG UID=5642
-ARG GID=5642
-
-RUN ( getent group $GID >/dev/null || groupadd -r -g $GID localgroup ) \
- && useradd -m -l -s /bin/bash -g $GID -N -u $UID localuser \
- && chown -R $UID:$GID /venv
-
-WORKDIR /home/localuser
-USER localuser
-
-COPY --chown=$UID:$GID train_test_split.py run_code.sh ./
-
-# This is basically what venv/bin/activate does
-ENV PATH="/venv/bin:$PATH"
-ENV VIRTUAL_ENV="/venv"
-
-CMD ["./run_code.sh"]
diff --git a/sandbox/pneumonia-prediction/containers/train-test-split/README.md b/sandbox/pneumonia-prediction/containers/train-test-split/README.md
deleted file mode 100644
index 37888a6..0000000
--- a/sandbox/pneumonia-prediction/containers/train-test-split/README.md
+++ /dev/null
@@ -1,24 +0,0 @@
-# Generalized Compute Example - Train Test Split
-
-
-### **Description**
-
-This example provides files that can be used with the Rhino Health Generalized Compute capability to remotely split an input cohort into two output cohorts on a Rhino Client.
-
-It shows how to:
-* Process an input CSV file as a dataframe
-* Create multiple output CSV files from this input
-* Use a multi-step Dockerfile to build the container image (using a separate step for installing requirements)
-
-Please reference the User Documentation and/or Tutorials for in depth explanations on how to use the Generalized Compute capability.
-
-
-### **Resources**
-- `Dockerfile` - This is the Dockerfile to be used for building the container image
-- `train_test_split.py` - This file contains the python code for splitting input cohort (using sklearn.model_selection.train_test_split)
-- `run_code.sh` - The entrypoint shell script for the docker container, which runs train_test_split.py
-- `requirements.txt` - The python requirements for this project
-
-
-# Getting Help
-For additional support, please reach out to [support@rhinohealth.com](mailto:support@rhinohealth.com).
diff --git a/sandbox/pneumonia-prediction/containers/train-test-split/requirements.txt b/sandbox/pneumonia-prediction/containers/train-test-split/requirements.txt
deleted file mode 100644
index 11586ac..0000000
--- a/sandbox/pneumonia-prediction/containers/train-test-split/requirements.txt
+++ /dev/null
@@ -1,4 +0,0 @@
-pandas==1.3.4
-sklearn==0.0
-sklearn-pandas==1.8.0
-scikit-learn==1.0.2
diff --git a/sandbox/pneumonia-prediction/containers/train-test-split/run_code.sh b/sandbox/pneumonia-prediction/containers/train-test-split/run_code.sh
deleted file mode 100755
index 42d46d9..0000000
--- a/sandbox/pneumonia-prediction/containers/train-test-split/run_code.sh
+++ /dev/null
@@ -1,8 +0,0 @@
-#!/bin/bash
-
-DIR=$(dirname "$(readlink -f "$BASH_SOURCE")")
-
-set -x
-set -e
-
-python $DIR/train_test_split_gc.py
diff --git a/sandbox/pneumonia-prediction/containers/train-test-split/train_test_split.py b/sandbox/pneumonia-prediction/containers/train-test-split/train_test_split.py
deleted file mode 100644
index 3f641a1..0000000
--- a/sandbox/pneumonia-prediction/containers/train-test-split/train_test_split.py
+++ /dev/null
@@ -1,15 +0,0 @@
-import pandas as pd
-from sklearn.model_selection import train_test_split
-
-
-if __name__ == '__main__':
- # Read cohort from /input
- df_cohort = pd.read_csv('/input/0/cohort_data.csv')
-
- # Split to train and test sets
- df_train, df_test = train_test_split(df_cohort, test_size=0.33)
-
- # Write cohorts to /output
- df_train.to_csv('/output/0/cohort_data.csv', index=False)
- df_test.to_csv('/output/1/cohort_data.csv', index=False)
-
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/006c7ff8-5a6afee0-79373cfb-ea4e4b32-47772737.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/006c7ff8-5a6afee0-79373cfb-ea4e4b32-47772737.dcm
new file mode 100644
index 0000000..14e2422
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/006c7ff8-5a6afee0-79373cfb-ea4e4b32-47772737.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/01da36c9-9d7122d3-2d404365-0f9ff9da-19f6fa7f.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/01da36c9-9d7122d3-2d404365-0f9ff9da-19f6fa7f.dcm
new file mode 100644
index 0000000..ff16062
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/01da36c9-9d7122d3-2d404365-0f9ff9da-19f6fa7f.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/030615a2-4d334013-2ec1103d-826ffca5-8efbf1ce.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/030615a2-4d334013-2ec1103d-826ffca5-8efbf1ce.dcm
new file mode 100644
index 0000000..36dfb3c
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/030615a2-4d334013-2ec1103d-826ffca5-8efbf1ce.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/04564240-d4e9e69c-1dd70a83-14b463cd-b7614743.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/04564240-d4e9e69c-1dd70a83-14b463cd-b7614743.dcm
new file mode 100644
index 0000000..5c03a06
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/04564240-d4e9e69c-1dd70a83-14b463cd-b7614743.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/04ba5bea-750001b4-69159aaa-982483ea-a312e632.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/04ba5bea-750001b4-69159aaa-982483ea-a312e632.dcm
new file mode 100644
index 0000000..e12206d
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/04ba5bea-750001b4-69159aaa-982483ea-a312e632.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/08b13120-21c79f0c-cb17e97e-0204cff8-b9344273.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/08b13120-21c79f0c-cb17e97e-0204cff8-b9344273.dcm
new file mode 100644
index 0000000..f8e6ad0
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/08b13120-21c79f0c-cb17e97e-0204cff8-b9344273.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/09cff9e7-cc333a22-325f1f17-3bee8cdd-b77fd40e.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/09cff9e7-cc333a22-325f1f17-3bee8cdd-b77fd40e.dcm
new file mode 100644
index 0000000..c2552b9
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/09cff9e7-cc333a22-325f1f17-3bee8cdd-b77fd40e.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/0ad13f6e-a4f6fe6e-098d04e6-a918a19c-64c35b96.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/0ad13f6e-a4f6fe6e-098d04e6-a918a19c-64c35b96.dcm
new file mode 100644
index 0000000..de5fcf7
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/0ad13f6e-a4f6fe6e-098d04e6-a918a19c-64c35b96.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/106df8c1-e6911050-b5ef7b32-c04648f0-6888c5be.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/106df8c1-e6911050-b5ef7b32-c04648f0-6888c5be.dcm
new file mode 100644
index 0000000..47cbcd2
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/106df8c1-e6911050-b5ef7b32-c04648f0-6888c5be.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/1642a6ae-0bbc5061-5b595e20-5f7b710e-f18a11a4.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/1642a6ae-0bbc5061-5b595e20-5f7b710e-f18a11a4.dcm
new file mode 100644
index 0000000..aa2063f
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/1642a6ae-0bbc5061-5b595e20-5f7b710e-f18a11a4.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/1843fb57-c53b5ad7-de39ad5a-3701dd61-91114279.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/1843fb57-c53b5ad7-de39ad5a-3701dd61-91114279.dcm
new file mode 100644
index 0000000..27373e7
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/1843fb57-c53b5ad7-de39ad5a-3701dd61-91114279.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/1b02acf3-fc15a6ba-a08f7627-5c054b0e-f7e45487.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/1b02acf3-fc15a6ba-a08f7627-5c054b0e-f7e45487.dcm
new file mode 100644
index 0000000..3de9e95
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/1b02acf3-fc15a6ba-a08f7627-5c054b0e-f7e45487.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/1bdae719-d6737127-a8a8fc5e-1667a94b-0d17f25d.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/1bdae719-d6737127-a8a8fc5e-1667a94b-0d17f25d.dcm
new file mode 100644
index 0000000..e9b6354
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/1bdae719-d6737127-a8a8fc5e-1667a94b-0d17f25d.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/1cf26fdd-54faeea9-6e331d75-f17c4cb9-11052ec0.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/1cf26fdd-54faeea9-6e331d75-f17c4cb9-11052ec0.dcm
new file mode 100644
index 0000000..3fb53c8
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/1cf26fdd-54faeea9-6e331d75-f17c4cb9-11052ec0.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/203fe250-ee25a5c2-85dc14a6-fb899964-47c4f8c2.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/203fe250-ee25a5c2-85dc14a6-fb899964-47c4f8c2.dcm
new file mode 100644
index 0000000..ad4ee5e
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/203fe250-ee25a5c2-85dc14a6-fb899964-47c4f8c2.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/2448eba2-2b809fdd-b64baf76-e2e1de81-40dca0a9.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/2448eba2-2b809fdd-b64baf76-e2e1de81-40dca0a9.dcm
new file mode 100644
index 0000000..42ed183
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/2448eba2-2b809fdd-b64baf76-e2e1de81-40dca0a9.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/2d2323b3-99246d04-83dbd60d-9f55adc4-b4ff3aa8.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/2d2323b3-99246d04-83dbd60d-9f55adc4-b4ff3aa8.dcm
new file mode 100644
index 0000000..7c2db33
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/2d2323b3-99246d04-83dbd60d-9f55adc4-b4ff3aa8.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/3096f071-81fccb40-7a8baa5c-5f6458ba-c375925f.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/3096f071-81fccb40-7a8baa5c-5f6458ba-c375925f.dcm
new file mode 100644
index 0000000..52d806a
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/3096f071-81fccb40-7a8baa5c-5f6458ba-c375925f.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/31c3ccee-c2e2ad38-6219ad42-d9ceef35-1e9ae1bb.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/31c3ccee-c2e2ad38-6219ad42-d9ceef35-1e9ae1bb.dcm
new file mode 100644
index 0000000..1cbf361
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/31c3ccee-c2e2ad38-6219ad42-d9ceef35-1e9ae1bb.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/3cd2aa9b-f3565a4c-82852ff6-fa6223d7-979bb93e.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/3cd2aa9b-f3565a4c-82852ff6-fa6223d7-979bb93e.dcm
new file mode 100644
index 0000000..d8ac14d
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/3cd2aa9b-f3565a4c-82852ff6-fa6223d7-979bb93e.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/3ce28cae-60d6d1ad-b0cca183-3a56cb8f-2726819b.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/3ce28cae-60d6d1ad-b0cca183-3a56cb8f-2726819b.dcm
new file mode 100644
index 0000000..daa9986
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/3ce28cae-60d6d1ad-b0cca183-3a56cb8f-2726819b.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/408e6c52-bea6ce5b-e304cfb3-719c0e0e-27b9f7fb.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/408e6c52-bea6ce5b-e304cfb3-719c0e0e-27b9f7fb.dcm
new file mode 100644
index 0000000..18d39f3
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/408e6c52-bea6ce5b-e304cfb3-719c0e0e-27b9f7fb.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/42456058-f50555f0-bbf439ea-b8fdb913-13f82f7b.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/42456058-f50555f0-bbf439ea-b8fdb913-13f82f7b.dcm
new file mode 100644
index 0000000..7c67352
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/42456058-f50555f0-bbf439ea-b8fdb913-13f82f7b.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/45265824-13e1dbef-eeb6d296-a9568545-911023b4.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/45265824-13e1dbef-eeb6d296-a9568545-911023b4.dcm
new file mode 100644
index 0000000..f6966a7
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/45265824-13e1dbef-eeb6d296-a9568545-911023b4.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/4f432654-17b34263-de12b2e1-e8950ac4-1043d1c9.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/4f432654-17b34263-de12b2e1-e8950ac4-1043d1c9.dcm
new file mode 100644
index 0000000..63203af
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/4f432654-17b34263-de12b2e1-e8950ac4-1043d1c9.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/4fcf3870-caa557ee-8fde3972-53fd8a97-3d06687b.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/4fcf3870-caa557ee-8fde3972-53fd8a97-3d06687b.dcm
new file mode 100644
index 0000000..a3b6b69
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/4fcf3870-caa557ee-8fde3972-53fd8a97-3d06687b.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/531b876f-bfdad4ce-a62e7bd4-f37c78de-583293a4.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/531b876f-bfdad4ce-a62e7bd4-f37c78de-583293a4.dcm
new file mode 100644
index 0000000..7a616df
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/531b876f-bfdad4ce-a62e7bd4-f37c78de-583293a4.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/59667117-db106cce-79b7bb90-90199a11-b1dc9569.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/59667117-db106cce-79b7bb90-90199a11-b1dc9569.dcm
new file mode 100644
index 0000000..0b79e5f
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/59667117-db106cce-79b7bb90-90199a11-b1dc9569.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/5fb4d3e2-f069bd04-6bcc2f3a-ee763f99-0a9d6bfc.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/5fb4d3e2-f069bd04-6bcc2f3a-ee763f99-0a9d6bfc.dcm
new file mode 100644
index 0000000..645b76e
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/5fb4d3e2-f069bd04-6bcc2f3a-ee763f99-0a9d6bfc.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/6009aadc-cc264ca6-e9a46956-d703950f-1a36637c.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/6009aadc-cc264ca6-e9a46956-d703950f-1a36637c.dcm
new file mode 100644
index 0000000..e6e3ff9
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/6009aadc-cc264ca6-e9a46956-d703950f-1a36637c.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/631df780-79a2d19c-3e08a8fd-64a6f728-ca9c22e2.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/631df780-79a2d19c-3e08a8fd-64a6f728-ca9c22e2.dcm
new file mode 100644
index 0000000..e9ece12
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/631df780-79a2d19c-3e08a8fd-64a6f728-ca9c22e2.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/6412732b-aeaa0a7f-d654eeae-1afea936-78961127.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/6412732b-aeaa0a7f-d654eeae-1afea936-78961127.dcm
new file mode 100644
index 0000000..002620f
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/6412732b-aeaa0a7f-d654eeae-1afea936-78961127.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/6a33112b-f1ce52e2-7541de60-0e84491d-e87c94b8.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/6a33112b-f1ce52e2-7541de60-0e84491d-e87c94b8.dcm
new file mode 100644
index 0000000..8a3252c
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/6a33112b-f1ce52e2-7541de60-0e84491d-e87c94b8.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/6b4c25b7-24889824-420290e2-6f924649-29b8c865.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/6b4c25b7-24889824-420290e2-6f924649-29b8c865.dcm
new file mode 100644
index 0000000..534f244
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/6b4c25b7-24889824-420290e2-6f924649-29b8c865.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/6dbe851f-a3e2a472-a52f91a5-f94e28ea-76421b89.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/6dbe851f-a3e2a472-a52f91a5-f94e28ea-76421b89.dcm
new file mode 100644
index 0000000..e3f8ff1
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/6dbe851f-a3e2a472-a52f91a5-f94e28ea-76421b89.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/759c753a-b9528a65-5a112776-d52388a2-17b64faa.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/759c753a-b9528a65-5a112776-d52388a2-17b64faa.dcm
new file mode 100644
index 0000000..82c2b4a
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/759c753a-b9528a65-5a112776-d52388a2-17b64faa.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/7603ae4a-98a45a49-9ebb2981-e4da1d6d-a88e3726.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/7603ae4a-98a45a49-9ebb2981-e4da1d6d-a88e3726.dcm
new file mode 100644
index 0000000..737bace
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/7603ae4a-98a45a49-9ebb2981-e4da1d6d-a88e3726.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/7b719ab6-2d7246c5-0b904604-1386c3bb-d70f96e4.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/7b719ab6-2d7246c5-0b904604-1386c3bb-d70f96e4.dcm
new file mode 100644
index 0000000..40c18ef
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/7b719ab6-2d7246c5-0b904604-1386c3bb-d70f96e4.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/7dc9384a-07e536b4-37a8ba67-5648512e-c265d93a.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/7dc9384a-07e536b4-37a8ba67-5648512e-c265d93a.dcm
new file mode 100644
index 0000000..1175eb5
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/7dc9384a-07e536b4-37a8ba67-5648512e-c265d93a.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/84f0ad73-7bf47610-aaef953b-1cee947a-39b63176.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/84f0ad73-7bf47610-aaef953b-1cee947a-39b63176.dcm
new file mode 100644
index 0000000..4ce1e7a
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/84f0ad73-7bf47610-aaef953b-1cee947a-39b63176.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/872c4801-c415fb88-8e18b278-2d9e98bd-5cc0b647.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/872c4801-c415fb88-8e18b278-2d9e98bd-5cc0b647.dcm
new file mode 100644
index 0000000..1dc0c5e
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/872c4801-c415fb88-8e18b278-2d9e98bd-5cc0b647.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/87315bb4-420ed0a0-960115e4-b3ff1682-d292d5d6.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/87315bb4-420ed0a0-960115e4-b3ff1682-d292d5d6.dcm
new file mode 100644
index 0000000..afe4cb5
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/87315bb4-420ed0a0-960115e4-b3ff1682-d292d5d6.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/8950c5f8-2b21ec75-bd4ab7ce-9d9a88c7-a0ec83cc.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/8950c5f8-2b21ec75-bd4ab7ce-9d9a88c7-a0ec83cc.dcm
new file mode 100644
index 0000000..32d5dff
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/8950c5f8-2b21ec75-bd4ab7ce-9d9a88c7-a0ec83cc.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/904f9048-019d0aaf-d4c81957-ef2e2b36-97ffd731.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/904f9048-019d0aaf-d4c81957-ef2e2b36-97ffd731.dcm
new file mode 100644
index 0000000..aa6147e
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/904f9048-019d0aaf-d4c81957-ef2e2b36-97ffd731.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/937f4f8f-5101e609-1936e8db-a3fa7358-0ed5cc0d.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/937f4f8f-5101e609-1936e8db-a3fa7358-0ed5cc0d.dcm
new file mode 100644
index 0000000..4333356
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/937f4f8f-5101e609-1936e8db-a3fa7358-0ed5cc0d.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/94c5f631-e1e61da6-d972f176-45999116-5a34af51.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/94c5f631-e1e61da6-d972f176-45999116-5a34af51.dcm
new file mode 100644
index 0000000..9222527
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/94c5f631-e1e61da6-d972f176-45999116-5a34af51.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/957aabff-db3b7660-cfd1ffb0-1eee294e-2c83fa42.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/957aabff-db3b7660-cfd1ffb0-1eee294e-2c83fa42.dcm
new file mode 100644
index 0000000..1aa5df4
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/957aabff-db3b7660-cfd1ffb0-1eee294e-2c83fa42.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/95ae0367-ba76f73d-1b8ab927-60a1fdf4-a2c799da.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/95ae0367-ba76f73d-1b8ab927-60a1fdf4-a2c799da.dcm
new file mode 100644
index 0000000..1198858
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/95ae0367-ba76f73d-1b8ab927-60a1fdf4-a2c799da.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/97c55590-5dfd53bb-77ba51c4-c07b2042-41d92b52.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/97c55590-5dfd53bb-77ba51c4-c07b2042-41d92b52.dcm
new file mode 100644
index 0000000..c19d111
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/97c55590-5dfd53bb-77ba51c4-c07b2042-41d92b52.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/99dc491b-414ae1b8-07ddd23a-b70313c7-b08dab6b.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/99dc491b-414ae1b8-07ddd23a-b70313c7-b08dab6b.dcm
new file mode 100644
index 0000000..b12906d
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/99dc491b-414ae1b8-07ddd23a-b70313c7-b08dab6b.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/9d4bcd5a-5befb88b-903ca64c-ea88961e-b4f29f88.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/9d4bcd5a-5befb88b-903ca64c-ea88961e-b4f29f88.dcm
new file mode 100644
index 0000000..f9b5011
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/9d4bcd5a-5befb88b-903ca64c-ea88961e-b4f29f88.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/a8887bca-c812618f-97d7a957-201e79b8-928a22ae.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/a8887bca-c812618f-97d7a957-201e79b8-928a22ae.dcm
new file mode 100644
index 0000000..b985bd4
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/a8887bca-c812618f-97d7a957-201e79b8-928a22ae.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/a999d784-df7cdd23-51d58158-11decc52-b67508a5.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/a999d784-df7cdd23-51d58158-11decc52-b67508a5.dcm
new file mode 100644
index 0000000..1bf91f7
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/a999d784-df7cdd23-51d58158-11decc52-b67508a5.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/ac127c61-8fb5d594-5d731e1d-0a5e9b09-fc1410d6.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/ac127c61-8fb5d594-5d731e1d-0a5e9b09-fc1410d6.dcm
new file mode 100644
index 0000000..83f8dbe
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/ac127c61-8fb5d594-5d731e1d-0a5e9b09-fc1410d6.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/b5364d93-80eeec2d-c2e76ef3-5693cdee-f3647040.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/b5364d93-80eeec2d-c2e76ef3-5693cdee-f3647040.dcm
new file mode 100644
index 0000000..ac2f200
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/b5364d93-80eeec2d-c2e76ef3-5693cdee-f3647040.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/b7bf9e34-e525dda7-e6ee2ccc-cad8436a-cbf76500.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/b7bf9e34-e525dda7-e6ee2ccc-cad8436a-cbf76500.dcm
new file mode 100644
index 0000000..5e5105c
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/b7bf9e34-e525dda7-e6ee2ccc-cad8436a-cbf76500.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/bb8c7f8e-a7df7989-753a709b-2128e935-8e954d19.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/bb8c7f8e-a7df7989-753a709b-2128e935-8e954d19.dcm
new file mode 100644
index 0000000..969c0dc
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/bb8c7f8e-a7df7989-753a709b-2128e935-8e954d19.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/be217967-d8cdc329-e0dad570-2ef99227-829e47f0.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/be217967-d8cdc329-e0dad570-2ef99227-829e47f0.dcm
new file mode 100644
index 0000000..e4a8235
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/be217967-d8cdc329-e0dad570-2ef99227-829e47f0.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/bebd281a-2c30427b-4c14fdcc-7c60fb71-97aa5377.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/bebd281a-2c30427b-4c14fdcc-7c60fb71-97aa5377.dcm
new file mode 100644
index 0000000..e850428
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/bebd281a-2c30427b-4c14fdcc-7c60fb71-97aa5377.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/c1a0ae6f-57bfac01-37322297-ae1ef131-0ee82e7a.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/c1a0ae6f-57bfac01-37322297-ae1ef131-0ee82e7a.dcm
new file mode 100644
index 0000000..17c4e7b
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/c1a0ae6f-57bfac01-37322297-ae1ef131-0ee82e7a.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/c28563c4-c2ded22f-ce5be44f-8d5d408a-0f10517f.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/c28563c4-c2ded22f-ce5be44f-8d5d408a-0f10517f.dcm
new file mode 100644
index 0000000..9641af5
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/c28563c4-c2ded22f-ce5be44f-8d5d408a-0f10517f.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/c3685f00-3be07b0e-39bcf036-c805590a-a7536e4e.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/c3685f00-3be07b0e-39bcf036-c805590a-a7536e4e.dcm
new file mode 100644
index 0000000..2961d6b
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/c3685f00-3be07b0e-39bcf036-c805590a-a7536e4e.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/c66c4c23-ab1dc847-6fe016e2-405f790b-ce1ae1f4.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/c66c4c23-ab1dc847-6fe016e2-405f790b-ce1ae1f4.dcm
new file mode 100644
index 0000000..6799a1e
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/c66c4c23-ab1dc847-6fe016e2-405f790b-ce1ae1f4.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/c85a7978-cd584e7d-4e759518-dbe90dd9-00767da8.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/c85a7978-cd584e7d-4e759518-dbe90dd9-00767da8.dcm
new file mode 100644
index 0000000..ce610ff
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/c85a7978-cd584e7d-4e759518-dbe90dd9-00767da8.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/cdf3a1fb-56d16ef0-902c1eb9-54dde3f3-b04c4fde.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/cdf3a1fb-56d16ef0-902c1eb9-54dde3f3-b04c4fde.dcm
new file mode 100644
index 0000000..d136702
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/cdf3a1fb-56d16ef0-902c1eb9-54dde3f3-b04c4fde.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/d3d592ff-36ace805-ac67d1eb-c2ce4ca9-800443d2.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/d3d592ff-36ace805-ac67d1eb-c2ce4ca9-800443d2.dcm
new file mode 100644
index 0000000..ddd87a0
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/d3d592ff-36ace805-ac67d1eb-c2ce4ca9-800443d2.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/d59eeaca-c11e54ff-a55ab5a8-18b8ca4e-c07dbd49.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/d59eeaca-c11e54ff-a55ab5a8-18b8ca4e-c07dbd49.dcm
new file mode 100644
index 0000000..34cdd17
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/d59eeaca-c11e54ff-a55ab5a8-18b8ca4e-c07dbd49.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/d7b52c01-8aa5ca3f-5d72702c-b89ff0b6-e9d7cb5e.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/d7b52c01-8aa5ca3f-5d72702c-b89ff0b6-e9d7cb5e.dcm
new file mode 100644
index 0000000..bfb9ba9
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/d7b52c01-8aa5ca3f-5d72702c-b89ff0b6-e9d7cb5e.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/db1f785e-ccee474a-fdf3dc10-df0645c6-aef59f0c.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/db1f785e-ccee474a-fdf3dc10-df0645c6-aef59f0c.dcm
new file mode 100644
index 0000000..f132102
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/db1f785e-ccee474a-fdf3dc10-df0645c6-aef59f0c.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/e44f96d3-e5a7319c-3bf19ba7-8b2b0bd4-9203eef2.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/e44f96d3-e5a7319c-3bf19ba7-8b2b0bd4-9203eef2.dcm
new file mode 100644
index 0000000..1acc465
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/e44f96d3-e5a7319c-3bf19ba7-8b2b0bd4-9203eef2.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/e766f948-ff7a00a3-0eaf8995-183d54c0-92df5d48.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/e766f948-ff7a00a3-0eaf8995-183d54c0-92df5d48.dcm
new file mode 100644
index 0000000..3f6ce4d
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/e766f948-ff7a00a3-0eaf8995-183d54c0-92df5d48.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/f0d4b86f-aface71f-579cb776-b40850e6-948c4b8f.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/f0d4b86f-aface71f-579cb776-b40850e6-948c4b8f.dcm
new file mode 100644
index 0000000..63692c7
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/f0d4b86f-aface71f-579cb776-b40850e6-948c4b8f.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/f1b69ada-169271ac-8746bb87-2d8828fa-09e96d65.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/f1b69ada-169271ac-8746bb87-2d8828fa-09e96d65.dcm
new file mode 100644
index 0000000..f5d5d20
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/f1b69ada-169271ac-8746bb87-2d8828fa-09e96d65.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/f87b48a9-3bf0211c-e7900377-bf1fad7c-7dec2159.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/f87b48a9-3bf0211c-e7900377-bf1fad7c-7dec2159.dcm
new file mode 100644
index 0000000..e3ea19a
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/f87b48a9-3bf0211c-e7900377-bf1fad7c-7dec2159.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/fb150544-608bcd6c-667384bf-eb09fe4b-c4e64b1d.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/fb150544-608bcd6c-667384bf-eb09fe4b-c4e64b1d.dcm
new file mode 100644
index 0000000..7122ec3
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/fb150544-608bcd6c-667384bf-eb09fe4b-c4e64b1d.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/fb1adb69-fb915626-8bcd8bcc-31f3d165-5bcb529a.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/fb1adb69-fb915626-8bcd8bcc-31f3d165-5bcb529a.dcm
new file mode 100644
index 0000000..1462699
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/fb1adb69-fb915626-8bcd8bcc-31f3d165-5bcb529a.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/fbd01520-3c6543f3-fa1809d3-0e9664cb-6ec3e330.dcm b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/fbd01520-3c6543f3-fa1809d3-0e9664cb-6ec3e330.dcm
new file mode 100644
index 0000000..1002d4c
Binary files /dev/null and b/sandbox/pneumonia-prediction/hco-data/hco-data/dicom/fbd01520-3c6543f3-fa1809d3-0e9664cb-6ec3e330.dcm differ
diff --git a/sandbox/pneumonia-prediction/hco-data/heath_dataset.csv b/sandbox/pneumonia-prediction/hco-data/heath_dataset.csv
new file mode 100644
index 0000000..dcc5f4f
--- /dev/null
+++ b/sandbox/pneumonia-prediction/hco-data/heath_dataset.csv
@@ -0,0 +1,78 @@
+"study_id","subject_id","seriesUID","Pneumonia"
+55199984,12000091,"2.25.222482885061394715030907992922233549837",1
+58487107,12000091,"2.25.131263413420037993130081253144913346147",1
+50127595,12000487,"2.25.219272181576475088951357994643132149233",1
+53092856,12001941,"2.25.157141664055701578989692779549165976869",1
+56675999,12018137,"2.25.203141725680102554897629942843718267631",1
+50331333,12065508,"2.25.91415822189368338327812872635768482263",1
+58713162,12065508,"2.25.299051378156564750161317826597379994642",1
+54624197,12086958,"2.25.217062126936511179769852013944227775033",1
+55037150,13000142,"2.25.327661546193365610191606995132687772201",1
+56783987,13004288,"2.25.319148546328754361898795344024443906261",1
+57734186,13004288,"2.25.304903916383842975905556660261697849168",1
+51764355,13024713,"2.25.13485916001359607510075692289073062386",1
+55223142,13025152,"2.25.299161635753561021744150625076459187935",1
+58846671,13025152,"2.25.25632161120913730613023567414593408288",0
+51274834,13042394,"2.25.57581815652386192778025149756757104509",0
+57047258,13042394,"2.25.230309655331662622415532671763030888447",0
+56875381,13068653,"2.25.94840418762657090139469576863929152162",0
+51658914,14000003,"2.25.279196810136411621127626952763639896935",0
+51800155,14000340,"2.25.92498165664083899790204299676082667061",0
+51750028,14000746,"2.25.161499554457577881732723435153677973619",0
+53717084,14000746,"2.25.248808892164438342358105413437738594293",0
+59019496,14001131,"2.25.69423412223567993343822806070846587637",0
+59536212,14001131,"2.25.38119352908082066628395082589757945975",0
+50393027,14020184,"2.25.231042301474310391755744619062859718375",0
+55239920,14038901,"2.25.117693597701186806694410162227403053017",1
+55263578,14042268,"2.25.107500065309603565367614470639160107721",0
+56074305,15000485,"2.25.201013018833655393189937789332045925964",0
+50077246,15001136,"2.25.161937228177715805130413516814494398015",0
+52592881,15001136,"2.25.16939625201192749302591445490170206945",0
+53301121,15001233,"2.25.327188925801428010085584637603473117900",0
+54924087,15001233,"2.25.331172185214085911100222288965192529583",0
+55068499,15001233,"2.25.69597356727282805486627003063383038083",0
+54726934,15001393,"2.25.13511749903603860999568775271127753137",0
+51737379,15001474,"2.25.101999750184617061798669561272074133100",0
+58221226,15001474,"2.25.249228423622968208685398798440381446537",0
+55616331,15001501,"2.25.201045720320933397559081420359688292311",0
+53180166,15025695,"2.25.177954832769876869522007244684877338685",1
+59376223,15025695,"2.25.21936876733474069898624071935858325592",0
+53396044,15065163,"2.25.18163239632933462482675968815061133234",0
+58664976,15065163,"2.25.165654852892626355705139893338812152259",0
+53106744,16000871,"2.25.59954866549967717682218420491141458273",0
+56018087,16000871,"2.25.327703905145800258593085793094611656116",0
+59978743,16000871,"2.25.182383950564955917726326538734996984087",0
+54837632,16062940,"2.25.196453450734844015037341108232107609266",0
+58547191,16067651,"2.25.247638891726831448982498898680573187091",0
+51509541,18071110,"2.25.113691122749048431874795015373063155656",0
+58831216,18071110,"2.25.270233553229443539392175113358630988837",0
+59141448,18073159,"2.25.204782873422703885727697644099227930633",0
+55658939,13007829,"2.25.290983598087138658191119014471659117272",1
+54588572,13010130,"2.25.244462752119941689295469352426983927268",1
+57483156,13010130,"2.25.119494668487507368837488417720363373893",1
+53317659,13020008,"2.25.219236618232897413253824323160767810898",1
+56888594,13020008,"2.25.211886767967316806330499880394507457476",1
+53438164,13090760,"2.25.254652901250577441769226516531779736143",0
+50703768,14007520,"2.25.241446898571066238499758213251107439608",1
+59292343,14007520,"2.25.132329309150748763461738580090542012780",1
+50896309,15012521,"2.25.39134913339824097161663170419748196180",0
+51977596,16011891,"2.25.198685686212950863451130138138980196519",0
+55388853,18000291,"2.25.59487989059514866471043306309461362900",0
+51014962,18000570,"2.25.242837542587900371888537501955707845489",0
+50785186,18000735,"2.25.16692367221901226474915206404089754104",0
+59355587,18000818,"2.25.210753993880658711626547480318807284849",0
+57119564,18001157,"2.25.215323556369628052107935102937754346371",0
+59771012,18001157,"2.25.66914684623248520025887430019769542358",0
+53184881,18001649,"2.25.30743657209371621531480552926179913933",0
+56463743,18006780,"2.25.229022332264847206733867889399288788544",0
+51268111,18030470,"2.25.103131655363298953772076895355182121471",1
+52269885,18030470,"2.25.74807549199705743589652006146996660361",0
+53127212,18030470,"2.25.199261312360190201106859992178904529568",0
+53528101,18030470,"2.25.241386347912889713721737760311762510876",0
+55376584,18030470,"2.25.151490899596408466026947263668144251511",0
+56064916,18030470,"2.25.152286424309277351245482707110089147751",0
+56626758,18030470,"2.25.280139315949176942915501908552855747203",0
+57244947,18030470,"2.25.252850977359361439614961301203758906428",1
+58152399,18030470,"2.25.174925257680637975578588807274939785879",0
+58562238,18030470,"2.25.238485503058005946176319138800858874659",0
+56867934,18080123,"2.25.285465219117782172989459557894516627943",0
diff --git a/sandbox/pneumonia-prediction/img/cohort_data_path.png b/sandbox/pneumonia-prediction/img/cohort_data_path.png
new file mode 100644
index 0000000..043f0a1
Binary files /dev/null and b/sandbox/pneumonia-prediction/img/cohort_data_path.png differ
diff --git a/sandbox/pneumonia-prediction/img/copy_uid.png b/sandbox/pneumonia-prediction/img/copy_uid.png
new file mode 100644
index 0000000..7f57eb0
Binary files /dev/null and b/sandbox/pneumonia-prediction/img/copy_uid.png differ
diff --git a/sandbox/pneumonia-prediction/img/dataset_container_path.png b/sandbox/pneumonia-prediction/img/dataset_container_path.png
new file mode 100644
index 0000000..e990880
Binary files /dev/null and b/sandbox/pneumonia-prediction/img/dataset_container_path.png differ
diff --git a/sandbox/pneumonia-prediction/img/mimc_cohorts.png b/sandbox/pneumonia-prediction/img/mimc_cohorts.png
new file mode 100644
index 0000000..17733c5
Binary files /dev/null and b/sandbox/pneumonia-prediction/img/mimc_cohorts.png differ
diff --git a/sandbox/pneumonia-prediction/img/mimic_datasets.png b/sandbox/pneumonia-prediction/img/mimic_datasets.png
new file mode 100644
index 0000000..ee0814d
Binary files /dev/null and b/sandbox/pneumonia-prediction/img/mimic_datasets.png differ
diff --git a/sandbox/pneumonia-prediction/img/sandbox_overview.jpg b/sandbox/pneumonia-prediction/img/sandbox_overview.jpg
new file mode 100644
index 0000000..ae92758
Binary files /dev/null and b/sandbox/pneumonia-prediction/img/sandbox_overview.jpg differ
diff --git a/sandbox/pneumonia-prediction/notebooks/data_engineering/2_data_engineering.ipynb b/sandbox/pneumonia-prediction/notebooks/data_engineering/2_data_engineering.ipynb
new file mode 100644
index 0000000..8b42c4e
--- /dev/null
+++ b/sandbox/pneumonia-prediction/notebooks/data_engineering/2_data_engineering.ipynb
@@ -0,0 +1,385 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "c69d61a4",
+ "metadata": {},
+ "source": [
+ "# Notebook #2: Data Engineering\n",
+ "### Transforming data across multiple nodes\n",
+ "In this notebook, we'll convert chest X-rays from the DICOM format (a standard format for medical images in clinical information systems) to JPEG files. This is a critical step in most machine learning pipelines that include image classification, segmentation, or other computer vision tasks. Because in this example the data is distributed over multiple health systems and *will remain on the edge*, we'll have to send any code that converts DICOM to JPEG to each health system's server. .\n",
+ "\n",
+ "#### Import the Rhino Health Python library\n",
+ "We'll again import any necessary functions from the `rhino_health` library and authenticate to the Rhino Cloud. Please refer to Notebook #1 for an explanation of the `session` interface for interacting with various endpoints in the Rhino Health ecosystem. In addition, you can always find more information about the Rhino SDK on our Official SDK Documentation and on our PyPI Repository Page"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "78fb9765-c937-4bcd-b970-311bae49e21a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Requirement already satisfied: pip in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (24.0)\n",
+ "Requirement already satisfied: rhino_health in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (1.0.3)\n",
+ "Requirement already satisfied: arrow<2,>=1.2.1 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from rhino_health) (1.2.3)\n",
+ "Requirement already satisfied: backoff<2.3,>=2.1.1 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from rhino_health) (2.2.1)\n",
+ "Requirement already satisfied: funcy<3,>=1.16 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from rhino_health) (1.18)\n",
+ "Requirement already satisfied: pydantic<2.6,>=2.5 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from rhino_health) (2.5.3)\n",
+ "Requirement already satisfied: ratelimit<2.3,>=2.2.1 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from rhino_health) (2.2.1)\n",
+ "Requirement already satisfied: requests<2.32,>=2.28.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from rhino_health) (2.31.0)\n",
+ "Requirement already satisfied: typing-extensions>=4.8.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from rhino_health) (4.10.0)\n",
+ "Requirement already satisfied: python-dateutil>=2.7.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from arrow<2,>=1.2.1->rhino_health) (2.8.2)\n",
+ "Requirement already satisfied: annotated-types>=0.4.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from pydantic<2.6,>=2.5->rhino_health) (0.6.0)\n",
+ "Requirement already satisfied: pydantic-core==2.14.6 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from pydantic<2.6,>=2.5->rhino_health) (2.14.6)\n",
+ "Requirement already satisfied: charset-normalizer<4,>=2 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from requests<2.32,>=2.28.0->rhino_health) (3.2.0)\n",
+ "Requirement already satisfied: idna<4,>=2.5 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from requests<2.32,>=2.28.0->rhino_health) (3.4)\n",
+ "Requirement already satisfied: urllib3<3,>=1.21.1 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from requests<2.32,>=2.28.0->rhino_health) (1.26.16)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from requests<2.32,>=2.28.0->rhino_health) (2023.7.22)\n",
+ "Requirement already satisfied: six>=1.5 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from python-dateutil>=2.7.0->arrow<2,>=1.2.1->rhino_health) (1.16.0)\n",
+ "Note: you may need to restart the kernel to use updated packages.\n"
+ ]
+ }
+ ],
+ "source": [
+ "pip install --upgrade pip rhino_health"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 67,
+ "id": "1114292d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import getpass\n",
+ "import rhino_health as rh\n",
+ "from rhino_health.lib.endpoints.code_object.code_object_dataclass import (\n",
+ " CodeObject,\n",
+ " CodeObjectCreateInput,\n",
+ " CodeTypes,\n",
+ " CodeObjectRunInput\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 47,
+ "id": "19491fd8-d145-41a9-96c1-d040409d29f0",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdin",
+ "output_type": "stream",
+ "text": [
+ " ········\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Logged In\n"
+ ]
+ }
+ ],
+ "source": [
+ "my_username = \"adrish+1@rhinohealth.com\" # Replace this with the email you use to log into Rhino Health\n",
+ "session = rh.login(username=my_username, password=getpass.getpass(), rhino_api_url='https://dev.rhinohealth.com/api/') ## chnage the URL to match the Rhino instance\n",
+ "print(\"Logged In\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c22aa633",
+ "metadata": {},
+ "source": [
+ "#### Retrieve Project and Relevant Datasets\n",
+ "In the previous notebook we interfaced with the `Project` dataclass by first retrieving the project's unique identifier from the Rhino web platform (via copy & paste). In contrast, in this notebook we'll accomplish this by using the `get_project_by_name()` function (but either way is fine!). \n",
+ "\n",
+ "Each instance of the `Project` class is associated with several helpful parameters, including `description` and `permissions` that can be accessed easily through the SDK. In this example, we'll use the `collaborating_workgroups` property to retreive and encode our workgroups, which we'll use later when we perform the DICOM to JPEG transformation on 'the edge'. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 48,
+ "id": "00cffc9f-2a49-464a-8380-02a6548df34d",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "942654f6-d292-4e53-bde1-406df31de2fd\n"
+ ]
+ }
+ ],
+ "source": [
+ "project = session.project.get_project_by_name(\"Federated Modeling\") # Replace with your project name\n",
+ "project_uid = project.uid\n",
+ "print(project_uid)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 49,
+ "id": "22db6132-3bdd-415b-b9d3-9828878f42f1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "cxr_schema = session.project.get_data_schema_by_name(\"mimic_cxr_dev schema\", project_uid=project_uid)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 51,
+ "id": "53617c2d-5069-4d37-8efd-949f2e3df198",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "309c9b54-3d41-435e-bc3c-1de8c794463a\n"
+ ]
+ }
+ ],
+ "source": [
+ "cxr_schema_uid = cxr_schema.uid\n",
+ "print(cxr_schema_uid)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "303b52e4-ca0c-4542-859e-3d1f3159b2bc",
+ "metadata": {},
+ "source": [
+ "#### Retrieve chest X-ray data from both participating sites\n",
+ "Now that we've identified both of the collaborating workgroups involved in our project, we can retrieve the identifiers for the datasets that each workgroup uploaded to their respective Rhino clients. In a later step, we'll use the dataset identifiers to execute the DICOM to JPEG transformation code on each respective dataset. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 64,
+ "id": "f3cf0ee9-cc66-48a9-a6c3-d38d461cede9",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Loaded CXR datasets 'b3321c41-85f1-492b-b56f-a6fa99c5c79e', 'a34d3b8a-bdb8-48a4-8858-fdc79dcd65a6'\n"
+ ]
+ }
+ ],
+ "source": [
+ "hco_cxr_dataset = project.get_dataset_by_name(\"mimic_cxr_hco\")\n",
+ "aidev_cxr_dataset = project.get_dataset_by_name(\"mimic_cxr_dev\")\n",
+ "hco_cxr_dataset_uid = hco_cxr_dataset.uid\n",
+ "aidev_cxr_dataset_uid = aidev_cxr_dataset.uid\n",
+ "print(f\"Loaded CXR datasets '{hco_cxr_dataset.uid}', '{aidev_cxr_dataset.uid}'\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "62b431d5-e830-4ac4-8c4b-201499e11d88",
+ "metadata": {},
+ "source": [
+ "#### Create a Code Object to transform x-rays from DICOM to JPEG\n",
+ "In this step, we'll use a container to convert the DICOM files to JPEG images. This functionality, referred to in the Rhino-verse as **Generalized Compute (GC)**, represents a versatile and powerful way to execute pre-built container images within the FCP environment. This Code Object type enables you to run custom code, computations, or processes that are encapsulated within container images. With GC Code Objects, you can harness the full potential of distributed computing while tailoring your computations to suit your specific needs."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 94,
+ "id": "7a0dd5e9-424e-4cda-83b5-542e442e1ad7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "python_code = \"\"\"\n",
+ "import pandas as pd\n",
+ "import os\n",
+ "import pydicom\n",
+ "import numpy as np\n",
+ "from PIL import Image\n",
+ "from sklearn.impute import SimpleImputer\n",
+ "import glob\n",
+ "\n",
+ "\n",
+ "def convert_dcm_image_to_jpg(name):\n",
+ "\tdcm = pydicom.dcmread(name)\n",
+ "\timg = dcm.pixel_array.astype(float)\n",
+ "\trescaled_image = (np.maximum(img, 0) / img.max()) * 255 # float pixels\n",
+ "\tfinal_image = np.uint8(rescaled_image) # integers pixels\n",
+ "\tfinal_image = Image.fromarray(final_image)\n",
+ "\treturn final_image\n",
+ "\n",
+ "\n",
+ "def dataset_dcm_to_jpg(dataset_df):\n",
+ "\tinput_dir = '/input/dicom_data/'\n",
+ "\toutput_dir = '/output/file_data/'\n",
+ "\tdcm_list = glob.glob(input_dir + '/*/*.dcm')\n",
+ "\n",
+ "\tdataset_df['JPG_file'] = 'Nan'\n",
+ "\tfor dcm_file in dcm_list:\n",
+ "\t\timage = convert_dcm_image_to_jpg(dcm_file)\n",
+ "\t\tjpg_file_name = dcm_file.split('/')[-1].split('.dcm')[0] + '.jpg'\n",
+ "\t\tds = pydicom.dcmread(dcm_file)\n",
+ "\t\tidx = dataset_df['Pneumonia'][dataset_df.SeriesUID == ds.SeriesInstanceUID].index[0]\n",
+ "\t\tground_truth = '1' if dataset_df.loc[idx, 'Pneumonia'] else '0'\n",
+ "\t\tclass_folder = output_dir + ground_truth\n",
+ "\t\tif not os.path.exists(class_folder):\n",
+ "\t\t\tos.makedirs(class_folder)\n",
+ "\t\timage.save('/'.join([class_folder, jpg_file_name]))\n",
+ "\t\tdataset_df.loc[idx, 'JPG file'] = '/'.join([ground_truth, jpg_file_name])\n",
+ "\n",
+ "\treturn dataset_df\n",
+ "\n",
+ "\n",
+ "if __name__ == '__main__':\n",
+ "\t# Read dataset from /input\n",
+ "\tdataset = pd.read_csv('/input/dataset.csv')\n",
+ "\n",
+ "\t# Convert DICOM to JPG\n",
+ "\tdataset = dataset_dcm_to_jpg(dataset)\n",
+ "\n",
+ "\t# Write dataset to /output\n",
+ "\tdataset.to_csv('/output/dataset.csv', index=False)\n",
+ " \"\"\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 106,
+ "id": "32a37efb-4f5f-4043-8b2d-ab976e6bc08a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Got Code Object 'DICOM to JPG Transformation Code' with uid 8c5828ac-422a-404f-8763-a131bf259bea\n"
+ ]
+ }
+ ],
+ "source": [
+ "code_object_params = CodeObjectCreateInput(\n",
+ " name=\"DICOM to JPG Transformation Code\",\n",
+ " description=\"CXR JPG transformation the AI dev and Health System datasets\",\n",
+ " input_data_schema_uids = [cxr_schema_uid],\n",
+ " output_data_schema_uids = [None], # a schema will be automatically generated\n",
+ " project_uid = project_uid,\n",
+ " code_type = CodeTypes.PYTHON_CODE,\n",
+ " code_execution_mode = 'AUTO_CONTAINER_SNIPPET',\n",
+ " requirements_mode = 'PYTHON_PIP',\n",
+ " config = {\n",
+ "\t\t \"python_code\": python_code,\n",
+ " \"requirements\" : [\"pandas == 1.3.4\", \"numpy == 1.21.3\",\"sklearn==0.0\", \"sklearn-pandas==1.8.0\", \"scikit-learn==1.0.2\",\"pydicom==2.2.0\",\"Pillow==8.4.0\"],\n",
+ " }\n",
+ ")\n",
+ "\n",
+ "data_code_object = session.code_object.create_code_object(code_object_params)\n",
+ "print(f\"Got Code Object '{data_code_object.name}' with uid {data_code_object.uid}\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 104,
+ "id": "0b77b18d-d607-4bc8-b9b3-26e410210d62",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Reterive the code object - in case you already have code object created previously \n",
+ "code_object = session.code_object.get_code_object_by_name(\"DICOM to JPG Transformation Code\", project_uid=project_uid)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c4f36033-e597-4c9b-916b-89235af73ebd",
+ "metadata": {},
+ "source": [
+ "#### Run the Code Object\n",
+ "In this step, we'll execute the code object that we just defined. We'll pass the dataset identifiers for both the AI developer's data as well as the health system's data. 'Under the hood', the container image is transmitted to both sites and executed on the respective DICOM files. As defined in the Python code within the container, the newly generated JPEG files will be saved as another dataset (with the `_conv` suffix as defined in the function argument below."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 105,
+ "id": "6626dd0a-228b-4c62-adff-12852c7a9276",
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Waiting for code run to complete (0 hours 0 minutes and 2 seconds)\n",
+ "Done\n",
+ "Finished running DICOM to JPG Transformation Code4\n"
+ ]
+ },
+ {
+ "ename": "AttributeError",
+ "evalue": "'CodeRun' object has no attribute 'result_info'",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m/var/folders/44/cnt94dhd1j10_j9_jtlfnngh0000gn/T/ipykernel_84661/3158612031.py\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0mrun_result\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcode_run\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwait_for_completion\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"Finished running {code_object.name}\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 10\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"Result status is '{run_result.status.value}', errors={run_result.result_info.get('errors') if run_result.result_info else None}\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+ "\u001b[0;32m/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pydantic/main.py\u001b[0m in \u001b[0;36m__getattr__\u001b[0;34m(self, item)\u001b[0m\n\u001b[1;32m 759\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 760\u001b[0m \u001b[0;31m# this is the current error\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 761\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mAttributeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf'{type(self).__name__!r} object has no attribute {item!r}'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 762\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 763\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__setattr__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mAny\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m->\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;31mAttributeError\u001b[0m: 'CodeRun' object has no attribute 'result_info'"
+ ]
+ }
+ ],
+ "source": [
+ "code_object_params = CodeObjectRunInput(\n",
+ " code_object_uid = code_object.uid,\n",
+ " input_dataset_uids = [[aidev_cxr_dataset_uid],[hco_cxr_dataset_uid]], \n",
+ " output_dataset_names_suffix = \"_conv\",\n",
+ " timeout_seconds = 600\n",
+ ")\n",
+ "code_run = session.code_object.run_code_object(code_object_params)\n",
+ "run_result = code_run.wait_for_completion()\n",
+ "print(f\"Finished running {code_object.name}\")\n",
+ "print(f\"Result status is '{run_result.status.value}', errors={run_result.result_info.get('errors') if run_result.result_info else None}\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0deef416-59c0-4a50-8631-e36351a80ecb",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/sandbox/pneumonia-prediction/1_data_extraction.ipynb b/sandbox/pneumonia-prediction/notebooks/data_extraction/1_data_extraction_aidev.ipynb
similarity index 67%
rename from sandbox/pneumonia-prediction/1_data_extraction.ipynb
rename to sandbox/pneumonia-prediction/notebooks/data_extraction/1_data_extraction_aidev.ipynb
index 9a80a77..a1e2e23 100644
--- a/sandbox/pneumonia-prediction/1_data_extraction.ipynb
+++ b/sandbox/pneumonia-prediction/notebooks/data_extraction/1_data_extraction_aidev.ipynb
@@ -6,11 +6,11 @@
"metadata": {},
"source": [
"# Notebook #1: Data Extraction\n",
- "## Importing tabular data onto Rhino with SQL queries\n",
+ "### Importing tabular data onto Rhino with SQL queries\n",
"In this notebook, you'll use SQL to query from an external database (such as a health system's clinical data warehouse) and import the results of those queries onto the Rhino Federated Computing Platform.\n",
"\n",
- "### Import the Rhino Health Python library\n",
- "The code below imports various classes and functions from the `rhino_health` library, which is a custom library designed to interact with the Rhino Federated Computing Platform. More information about the SDK can be found on our [Official SDK Documentation](https://rhinohealth.github.io/rhino_sdk_docs/html/autoapi/index.html) and on our [PyPI Repository Page](https://pypi.org/project/rhino-health/) "
+ "#### Import the Rhino Health Python library\n",
+ "The code below imports various classes and functions from the `rhino_health` library, which is a custom library designed to interact with the Rhino Federated Computing Platform. More information about the SDK can be found on our Official SDK Documentation and on our PyPI Repository Page"
]
},
{
@@ -25,7 +25,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 1,
"id": "f9e3e349",
"metadata": {},
"outputs": [],
@@ -33,11 +33,13 @@
"import getpass\n",
"from pprint import pprint\n",
"import rhino_health as rh\n",
- "from rhino_health.lib.endpoints.cohort.cohort_dataclass import CohortCreateInput\n",
- "from rhino_health.lib.endpoints.data_schema.data_schema_dataclass import DataschemaCreateInput\n",
- "from rhino_health.lib.endpoints.project.project_dataclass import ProjectCreateInput\n",
- "from rhino_health.lib.endpoints.sql_query.sql_query_dataclass import (SQLQueryImportInput,SQLQueryInput,SQLServerTypes,ConnectionDetails)\n",
- "from rhino_health.lib.endpoints.aimodel.aimodel_dataclass import (AIModel,AIModelCreateInput,AIModelRunInput,ModelTypes)"
+ "from rhino_health.lib.endpoints.dataset.dataset_dataclass import DatasetCreateInput\n",
+ "from rhino_health.lib.endpoints.sql_query.sql_query_dataclass import (\n",
+ " SQLQueryImportInput,\n",
+ " SQLQueryInput,\n",
+ " SQLServerTypes,\n",
+ " ConnectionDetails,\n",
+ ")"
]
},
{
@@ -46,18 +48,34 @@
"metadata": {},
"source": [
"### Authenticate to the Rhino FCP\n",
- "The `RhinoSession` class in the `rhino_health` library is a comprehensive interface for interacting with various endpoints in the Rhino Health ecosystem. It offers direct access to multiple specialized endpoints, including AI models, cohorts, data schemas, model results, projects, and workgroups, facilitating a wide range of operations in healthcare data management and analysis. The class also supports features like two-factor authentication and user switching, enhancing security and flexibility in handling different user sessions and workflows within the Rhino Health platform."
+ "The `RhinoSession` class in the `rhino_health` library is a comprehensive interface for interacting with various endpoints in the Rhino Health ecosystem. It offers direct access to multiple specialized endpoints, including Code Objects, Datasets, Data Schemas, Code Runs, Projects, and Workgroups, facilitating a wide range of operations in healthcare data management and analysis. The class also supports features like two-factor authentication and user switching, enhancing security and flexibility in handling different user sessions and workflows within the Rhino Health platform."
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 2,
"id": "3600a62c-8df7-4f1a-aa28-bf7528caa3a4",
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdin",
+ "output_type": "stream",
+ "text": [
+ " ········\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Logged In\n"
+ ]
+ }
+ ],
"source": [
- "my_username = \"FCP_LOGIN_EMAIL\" # Replace this with the email you use to log into Rhino Health\n",
- "session = rh.login(username=my_username, password=getpass.getpass())"
+ "my_username = \"adrish+1@rhinohealth.com\" # Replace this with the email you use to log into Rhino Health\n",
+ "session = rh.login(username=my_username, password=getpass.getpass(), rhino_api_url='https://dev.rhinohealth.com/api/') ## chnage the URL to match the Rhino instance\n",
+ "print(\"Logged In\")"
]
},
{
@@ -68,40 +86,60 @@
"### Identify the desired project in the Rhino UI.\n",
"Before completing this step using the Python SDK, create a project on the Rhino web platform. Once the project has been created, copy the UID from the project you just created in the UI by navigating to the homepage, pressing on the three-vertical dot button in your project's square, and then selecting the button Copy UID.\n",
"\n",
- "\n",
- ""
+ ""
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 3,
"id": "ae904603-854b-457b-9203-294a1abbb1d9",
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "33cf1db0-de14-472a-8dcb-8d83de22d946\n"
+ ]
+ }
+ ],
"source": [
- "project_uid = 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX' # Replace with your Project's UID\n",
- "workgroup_uid = session.project.get_collaborating_workgroups(project_uid)[0].uid"
+ "project_uid = '99c775aa-b93d-4b5a-8ce3-b4bcf0250b37' # Replace with your Project's UID\n",
+ "\n",
+ "workgroup_uid = session.current_user.primary_workgroup.uid\n",
+ "print(workgroup_uid)"
]
},
{
"cell_type": "markdown",
"id": "7324ef97",
- "metadata": {},
+ "metadata": {
+ "jp-MarkdownHeadingCollapsed": true
+ },
"source": [
"### Connection Setup\n",
- "The `rhino_health.lib.endpoints.sql_query.sql_query_dataclass` module in the Rhino Health library provides classes to handle SQL queries against external databases and import data into the Rhino Federated Computing Platform. It includes `SQLQueryInput` for specifying parameters of a SQL query, `SQLQueryImportInput` for importing a cohort from an external SQL database query, and `SQLQuery`, a class representing an executed SQL query. Additional classes like `QueryResultStatus` and `SQLServerTypes` define the status of query results and supported SQL server types, respectively, while the `ConnectionDetails` class specifies connection details for an external SQL database.\n",
+ "The `rhino_health.lib.endpoints.sql_query.sql_query_dataclass` module in the Rhino Health library provides classes to handle SQL queries against external databases and import data into the Rhino Federated Computing Platform. It includes `SQLQueryInput` for specifying parameters of a SQL query, `SQLQueryImportInput` for importing a Dataset from an external SQL database query, and `SQLQuery`, a class representing an executed SQL query. Additional classes like `QueryResultStatus` and `SQLServerTypes` define the status of query results and supported SQL server types, respectively, while the `ConnectionDetails` class specifies connection details for an external SQL database.\n",
"\n",
- "More information about Rhino's SQL classes can be found by reviewing our SDK documentation [here](https://rhinohealth.github.io/rhino_sdk_docs/html/autoapi/rhino_health/lib/endpoints/sql_query/index.html).\n",
+ "More information about Rhino's SQL classes can be found by reviewing our SDK documentation here.\n",
"\n",
+ "##### A note about `server_type`:\n",
"When specifying the connection details, ensure that you provide the server_type using the approved SQLServerTypes enum. This step ensures that your server is supported and compatible with the querying process."
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 4,
"id": "0da2aa6e",
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdin",
+ "output_type": "stream",
+ "text": [
+ " ········\n"
+ ]
+ }
+ ],
"source": [
"sql_db_user = \"rhino\" # Replace this with your DB username (make sure the user has read-only permissions to the DB).\n",
"external_server_url = \"ext-hospital-data.covi47dnmpiy.us-east-1.rds.amazonaws.com:5432\" # Replace this with url + port of the SQL DB you want to query (ie \"{url}:{port}\").\n",
@@ -122,10 +160,10 @@
"metadata": {},
"source": [
"### Writing SQL queries against the DB\n",
- "Using the `SQLQueryImportInput` function will allow us to query an external relational database and import the results of the query as a cohort. A Cohort is a central concept on the Rhino platform; to learn more, please navigate to this [link](https://docs.rhinohealth.com/hc/en-us/articles/12384748397213-What-is-a-Cohort-)\n",
+ "Using the `SQLQueryImportInput` function will allow us to query an external relational database and import the results of the query as a Dataset. A Dataset is a central concept on the Rhino platform; to learn more, please navigate to this link.\n",
"\n",
"Executing the `SQLQueryImportInput` function requires a few arguments:\n",
- "- cohort_name (str): Name for the cohort you are creating.\n",
+ "- datasett_name (str): Name for the Dataset you are creating.\n",
"- is_data_deidentified (bool): Indicates if the data in the query is deidentified for privacy reasons.\n",
"- connection_details (ConnectionDetails): Details like URL, user, and password to connect to the SQL server.\n",
"- data_schema_uid (Optional[str]): The unique identifier for the data schema in the context of the query.\n",
@@ -140,10 +178,21 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 5,
"id": "66339119-56f3-41ba-b022-2e4380076f61",
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Waiting for SQL query to complete (0 hours 0 minutes and a second)\n",
+ "Waiting for SQL query to complete (0 hours 0 minutes and 12 seconds)\n",
+ "Waiting for SQL query to complete (0 hours 0 minutes and 23 seconds)\n",
+ "Run finished successfully\n"
+ ]
+ }
+ ],
"source": [
"query_demo=\"\"\"\n",
"SELECT distinct\n",
@@ -151,15 +200,11 @@
" , adm.hadm_id\n",
" , pat.anchor_age + (EXTRACT(YEAR FROM adm.admittime) - pat.anchor_year) AS age\n",
" , pat.gender\n",
- " , adm.insurance\n",
- " , adm.admission_type\n",
- " ,adm.admission_location\n",
- " ,adm.discharge_location\n",
- " ,adm.language\n",
- " ,adm.marital_status\n",
" , adm.race\n",
" , icd.icd_code as diagnosis_code\n",
+ " ,icd.long_title as diagnosis_desc\n",
" ,proc.icd_code as procedure_code\n",
+ " ,proc.long_title as procedure_desc\n",
"FROM mimiciv_hosp.admissions adm\n",
"LEFT JOIN mimiciv_hosp.patients pat\n",
"ON pat.subject_id = adm.subject_id\n",
@@ -182,14 +227,14 @@
" project = project_uid, # The project/workgroup will be used to validate permissions (including and k_anonymization value)\n",
" workgroup = workgroup_uid,\n",
" connection_details = connection_details,\n",
- " cohort_name = 'mimic_ehr_demo_dev',\n",
- " data_schema_uid = None, # Auto-Generating the Output Data Schema for the Cohort\n",
+ " dataset_name = 'mimic_ehr_demo_dev',\n",
+ " data_schema_uid = None, # Auto-Generating the Output Data Schema for the Dataset\n",
" timeout_seconds = 1200,\n",
" is_data_deidentified = True,\n",
" sql_query = query_demo\n",
")\n",
"\n",
- "response = session.sql_query.import_cohort_from_sql_query(import_run_params)"
+ "response = session.sql_query.import_dataset_from_sql_query(import_run_params)"
]
},
{
@@ -203,10 +248,20 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 6,
"id": "21d3f4eb-3ef9-4302-8bf3-34c5f7ece97a",
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Waiting for SQL query to complete (0 hours 0 minutes and a second)\n",
+ "Waiting for SQL query to complete (0 hours 0 minutes and 12 seconds)\n",
+ "Run finished successfully\n"
+ ]
+ }
+ ],
"source": [
"query_obs = \"\"\"\n",
"SELECT\n",
@@ -226,14 +281,14 @@
" project = project_uid, # The project/workgroup will be used to validate permissions (including and k_anonymization value)\n",
" workgroup = workgroup_uid,\n",
" connection_details = connection_details,\n",
- " cohort_name = 'mimic_ehr_obs_dev',\n",
- " data_schema_uid = None, # Auto-Generating the Output Data Schema for the Cohort\n",
+ " dataset_name = 'mimic_ehr_obs_dev',\n",
+ " data_schema_uid = None, # Auto-Generating the Output Data Schema for the Dataset\n",
" timeout_seconds = 1200,\n",
" is_data_deidentified = True,\n",
" sql_query = query_obs\n",
")\n",
"\n",
- "response = session.sql_query.import_cohort_from_sql_query(import_run_params)"
+ "response = session.sql_query.import_dataset_from_sql_query(import_run_params)"
]
},
{
@@ -241,37 +296,46 @@
"id": "98e754e6",
"metadata": {},
"source": [
- "### Importing chest x-rays from a PACS system into my Rhino client\n",
- "Next, we'll import chest x-rays onto my Rhino client so that we can conduct a computer vision experiment in the following steps:.\n",
+ "### Images: Importing chest x-rays from a PACS system into my Rhino client\n",
+ "Next, we'll import chest x-rays into our project so that we can conduct a computer vision experiment. \n",
"\n",
- "**To enable a friction-less guided sandbox experience, Rhino staff have uploaded DICOM data into the project for you.** If you are interested in learning more about how data can be imported from your local computing environment into the Rhino Federated Computing Platform, please refer to this section of our documentation [here](https://docs.rhinohealth.com/hc/en-us/articles/12385912890653-Adding-Data-to-your-Rhino-Federated-Computing-Platform-Client).\n",
+ "**To enable a friction-less guided sandbox experience, Rhino staff have uploaded DICOM data into the project for you.** If you are interested in learning more about how data can be imported from your local computing environment into the Rhino Federated Computing Platform, please refer to this section of our documentation here.\n",
"\n",
- "The data has been loaded in the `/rhino_data/image/dicom` path in the Rhino client. In addition, a file that provides metadata to associate the DICOM studies with the EHR data has been imported ('/rhino_data/image/metadata/aidev_cohort.csv')."
+ "The data has been loaded in the `/rhino_data/image/dicom` path in the Rhino client. In addition, a file that provides metadata to associate the DICOM studies with the EHR data has been imported ('/rhino_data/image/metadata/aidev_metadata.csv')."
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 8,
"id": "5f652fdc",
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Created new Dataset 'mimic_cxr_dev' with uid '0cd4dd2b-71a7-4854-bd33-4dabb64d1bc1'\n"
+ ]
+ }
+ ],
"source": [
- "dicom_path = \"/rhino_data/image/dicom\"\n",
- "metadata_file = \"/rhino_data/image/metadata/aidev_cohort.csv\"\n",
+ "dicom_path = \"/rhino_data/dicom\" # Replace with your DICOM file location\n",
+ "metadata_file = \"/rhino_data/aidev_dataset.csv\" # Replace with your CSV file location\n",
"\n",
- "cohort_creation_params = CohortCreateInput(\n",
+ "dataset_creation_params = DatasetCreateInput(\n",
" name=\"mimic_cxr_dev\",\n",
" description=\"mimic_cxr_dev\",\n",
" project_uid=project_uid, \n",
" workgroup_uid=workgroup_uid,\n",
+ " data_schema_uid = None,\n",
" image_filesystem_location=dicom_path,\n",
" csv_filesystem_location = metadata_file,\n",
" is_data_deidentified=True,\n",
" method=\"filesystem\",\n",
")\n",
"\n",
- "ai_developer_image_cohort = session.cohort.add_cohort(cohort_creation_params)\n",
- "print(f\"Created new cohort '{ai_developer_image_cohort.name}' with uid '{ai_developer_image_cohort.uid}'\")"
+ "ai_developer_image_dataset = session.dataset.add_dataset(dataset_creation_params)\n",
+ "print(f\"Created new Dataset '{ai_developer_image_dataset.name}' with uid '{ai_developer_image_dataset.uid}'\")"
]
},
{
@@ -280,8 +344,8 @@
"metadata": {},
"source": [
"### What you'll see in the Rhino UI:\n",
- "Once all three queries have been executed, you should see three cohorts in the user interface:\n",
- ""
+ "Once all three queries have been executed, you should see three Datasets in the user interface:\n",
+ ""
]
},
{
@@ -290,10 +354,10 @@
"metadata": {},
"source": [
"### Where is my data in the Rhino client? \n",
- "Once data is uploaded, it'll reside in your designated Rhino client. While the Rhino Federated Computing Platform eliminates the need for the user to know the path of the data (enabling users just to refer to 'cohorts' it'll reside in the `/rhino_data/image/dicom` folder. \n",
- "\n",
+ "Once data is uploaded, it'll reside in your designated Rhino client. While the Rhino Federated Computing Platform eliminates the need for the user to know the path of the data (enabling users just to refer to 'Datasets' it'll reside in the `/rhino_data/image/dicom` folder. \n",
+ "\n",
"\n",
- "To learn more about working with DICOM data on the Rhino Federated Computing Platform, please refer to our documentation [here](https://docs.rhinohealth.com/hc/en-us/articles/13136536913693-Example-1-Defining-a-Cohort-with-DICOM-Data)."
+ "To learn more about working with DICOM data on the Rhino Federated Computing Platform, please refer to our documentation here."
]
}
],
@@ -312,7 +376,8 @@
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
- "pygments_lexer": "ipython3"
+ "pygments_lexer": "ipython3",
+ "version": "3.11.4"
}
},
"nbformat": 4,
diff --git a/sandbox/pneumonia-prediction/notebooks/data_extraction/data_extraction_hco.ipynb b/sandbox/pneumonia-prediction/notebooks/data_extraction/data_extraction_hco.ipynb
new file mode 100644
index 0000000..61ea082
--- /dev/null
+++ b/sandbox/pneumonia-prediction/notebooks/data_extraction/data_extraction_hco.ipynb
@@ -0,0 +1,427 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "8cec8749",
+ "metadata": {},
+ "source": [
+ "# Notebook: Health System Data Extraction\n",
+ "### Importing tabular data onto Rhino with SQL queries\n",
+ "In this notebook, you'll use SQL to query from an external database (such as a health system's clinical data warehouse) and import the results of those queries onto the Rhino Federated Computing Platform.\n",
+ "\n",
+ "#### Import the Rhino Health Python library\n",
+ "The code below imports various classes and functions from the `rhino_health` library, which is a custom library designed to interact with the Rhino Federated Computing Platform. More information about the SDK can be found on our Official SDK Documentation and on our PyPI Repository Page"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "bb085bd3-3cb2-49d6-b628-7f357fe1a1c7",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Requirement already satisfied: rhino_health in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (1.0.3)\n",
+ "Collecting rhino_health\n",
+ " Downloading rhino_health-1.0.5-py3-none-any.whl.metadata (4.7 kB)\n",
+ "Requirement already satisfied: arrow<2,>=1.2.1 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from rhino_health) (1.2.3)\n",
+ "Requirement already satisfied: backoff<2.3,>=2.1.1 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from rhino_health) (2.2.1)\n",
+ "Requirement already satisfied: funcy<3,>=1.16 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from rhino_health) (1.18)\n",
+ "Requirement already satisfied: pydantic<2.6,>=2.5 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from rhino_health) (2.5.3)\n",
+ "Requirement already satisfied: ratelimit<2.3,>=2.2.1 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from rhino_health) (2.2.1)\n",
+ "Requirement already satisfied: requests<2.32,>=2.28.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from rhino_health) (2.31.0)\n",
+ "Requirement already satisfied: typing-extensions>=4.8.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from rhino_health) (4.10.0)\n",
+ "Requirement already satisfied: python-dateutil>=2.7.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from arrow<2,>=1.2.1->rhino_health) (2.8.2)\n",
+ "Requirement already satisfied: annotated-types>=0.4.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from pydantic<2.6,>=2.5->rhino_health) (0.6.0)\n",
+ "Requirement already satisfied: pydantic-core==2.14.6 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from pydantic<2.6,>=2.5->rhino_health) (2.14.6)\n",
+ "Requirement already satisfied: charset-normalizer<4,>=2 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from requests<2.32,>=2.28.0->rhino_health) (3.2.0)\n",
+ "Requirement already satisfied: idna<4,>=2.5 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from requests<2.32,>=2.28.0->rhino_health) (3.4)\n",
+ "Requirement already satisfied: urllib3<3,>=1.21.1 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from requests<2.32,>=2.28.0->rhino_health) (1.26.16)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from requests<2.32,>=2.28.0->rhino_health) (2023.7.22)\n",
+ "Requirement already satisfied: six>=1.5 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from python-dateutil>=2.7.0->arrow<2,>=1.2.1->rhino_health) (1.16.0)\n",
+ "Downloading rhino_health-1.0.5-py3-none-any.whl (93 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m93.4/93.4 kB\u001b[0m \u001b[31m1.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n",
+ "\u001b[?25hInstalling collected packages: rhino_health\n",
+ " Attempting uninstall: rhino_health\n",
+ " Found existing installation: rhino_health 1.0.3\n",
+ " Uninstalling rhino_health-1.0.3:\n",
+ " Successfully uninstalled rhino_health-1.0.3\n",
+ "Successfully installed rhino_health-1.0.5\n",
+ "Note: you may need to restart the kernel to use updated packages.\n"
+ ]
+ }
+ ],
+ "source": [
+ "pip install --upgrade rhino_health"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "f9e3e349",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import getpass\n",
+ "from pprint import pprint\n",
+ "import rhino_health as rh\n",
+ "from rhino_health.lib.endpoints.dataset.dataset_dataclass import DatasetCreateInput\n",
+ "from rhino_health.lib.endpoints.sql_query.sql_query_dataclass import (\n",
+ " SQLQueryImportInput,\n",
+ " SQLQueryInput,\n",
+ " SQLServerTypes,\n",
+ " ConnectionDetails,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3e04b454",
+ "metadata": {},
+ "source": [
+ "### Authenticate to the Rhino FCP\n",
+ "The `RhinoSession` class in the `rhino_health` library is a comprehensive interface for interacting with various endpoints in the Rhino Health ecosystem. It offers direct access to multiple specialized endpoints, including Code Objects, Datasets, Data Schemas, Code Runs, Projects, and Workgroups, facilitating a wide range of operations in healthcare data management and analysis. The class also supports features like two-factor authentication and user switching, enhancing security and flexibility in handling different user sessions and workflows within the Rhino Health platform."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "3600a62c-8df7-4f1a-aa28-bf7528caa3a4",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdin",
+ "output_type": "stream",
+ "text": [
+ " ········\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Logged In\n"
+ ]
+ }
+ ],
+ "source": [
+ "my_username = \"adrish+2@rhinohealth.com\" # Replace this with the email you use to log into Rhino Health\n",
+ "session = rh.login(username=my_username, password=getpass.getpass(), rhino_api_url='https://dev.rhinohealth.com/api/') ## chnage the URL to match the Rhino instance\n",
+ "print(\"Logged In\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "101d31d9-b79f-4fa9-a9dd-1c8fe31a4cff",
+ "metadata": {},
+ "source": [
+ "### Identify the desired project in the Rhino UI.\n",
+ "Before completing this step using the Python SDK, create a project on the Rhino web platform. Once the project has been created, copy the UID from the project you just created in the UI by navigating to the homepage, pressing on the three-vertical dot button in your project's square, and then selecting the button Copy UID.\n",
+ "\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "ae904603-854b-457b-9203-294a1abbb1d9",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "48cb366f-b05f-4ca2-8e1d-6dfc336cd344\n"
+ ]
+ }
+ ],
+ "source": [
+ "project_uid = '99c775aa-b93d-4b5a-8ce3-b4bcf0250b37' # Replace with your Project's UID\n",
+ "\n",
+ "workgroup_uid = session.current_user.primary_workgroup.uid\n",
+ "print(workgroup_uid)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7324ef97",
+ "metadata": {},
+ "source": [
+ "### Connection Setup\n",
+ "The `rhino_health.lib.endpoints.sql_query.sql_query_dataclass` module in the Rhino Health library provides classes to handle SQL queries against external databases and import data into the Rhino Federated Computing Platform. It includes `SQLQueryInput` for specifying parameters of a SQL query, `SQLQueryImportInput` for importing a Dataset from an external SQL database query, and `SQLQuery`, a class representing an executed SQL query. Additional classes like `QueryResultStatus` and `SQLServerTypes` define the status of query results and supported SQL server types, respectively, while the `ConnectionDetails` class specifies connection details for an external SQL database.\n",
+ "\n",
+ "More information about Rhino's SQL classes can be found by reviewing our SDK documentation here.\n",
+ "\n",
+ "##### A note about `server_type`:\n",
+ "When specifying the connection details, ensure that you provide the server_type using the approved SQLServerTypes enum. This step ensures that your server is supported and compatible with the querying process."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "0da2aa6e",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdin",
+ "output_type": "stream",
+ "text": [
+ " ········\n"
+ ]
+ }
+ ],
+ "source": [
+ "sql_db_user = \"rhino\" # Replace this with your DB username (make sure the user has read-only permissions to the DB).\n",
+ "external_server_url = \"ext-hospital-data.covi47dnmpiy.us-east-1.rds.amazonaws.com:5432\" # Replace this with url + port of the SQL DB you want to query (ie \"{url}:{port}\").\n",
+ "db_name = \"hospital_data\" # Replace this with your DB name.\n",
+ "\n",
+ "connection_details = ConnectionDetails(\n",
+ " server_user=sql_db_user,\n",
+ " password=getpass.getpass(), \n",
+ " server_type=SQLServerTypes.POSTGRESQL, # Replace POSTGRESQL with the relevant type of your sql server (See docs for all supported types).\n",
+ " server_url=external_server_url,\n",
+ " db_name=db_name\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1136895c",
+ "metadata": {},
+ "source": [
+ "### Writing SQL queries against the DB\n",
+ "Using the `SQLQueryImportInput` function will allow us to query an external relational database and import the results of the query as a Dataset. A Dataset is a central concept on the Rhino platform; to learn more, please navigate to this link.\n",
+ "\n",
+ "Executing the `SQLQueryImportInput` function requires a few arguments:\n",
+ "- datasett_name (str): Name for the Dataset you are creating.\n",
+ "- is_data_deidentified (bool): Indicates if the data in the query is deidentified for privacy reasons.\n",
+ "- connection_details (ConnectionDetails): Details like URL, user, and password to connect to the SQL server.\n",
+ "- data_schema_uid (Optional[str]): The unique identifier for the data schema in the context of the query.\n",
+ "- timeout_seconds (int): Time limit in seconds for the query execution.\n",
+ "- project_uid (str): Unique identifier for the project context of the query.\n",
+ "- workgroup_uid (str): Unique identifier for the workgroup context of the query.\n",
+ "- sql_query (str): The actual SQL query to be run.\n",
+ "\n",
+ "#### Table 1: Patient Admission Data\n",
+ "Our first query will retrieve patient demographics and associated clinical codes from inpatient admissions for patients with chest x-rays (see the WHERE clause, where we identify a selection of chest x-rays in the MIMIC v4 database)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "66339119-56f3-41ba-b022-2e4380076f61",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Waiting for SQL query to complete (0 hours 0 minutes and a second)\n",
+ "Waiting for SQL query to complete (0 hours 0 minutes and 12 seconds)\n",
+ "Waiting for SQL query to complete (0 hours 0 minutes and 23 seconds)\n",
+ "Run finished successfully\n"
+ ]
+ }
+ ],
+ "source": [
+ "query_demo=\"\"\"\n",
+ "SELECT distinct\n",
+ " pat.subject_id\n",
+ " , adm.hadm_id\n",
+ " , pat.anchor_age + (EXTRACT(YEAR FROM adm.admittime) - pat.anchor_year) AS age\n",
+ " , pat.gender\n",
+ " , adm.insurance\n",
+ " , adm.admission_type\n",
+ " ,adm.admission_location\n",
+ " ,adm.discharge_location\n",
+ " ,adm.language\n",
+ " ,adm.marital_status\n",
+ " , adm.race\n",
+ " , icd.icd_code as diagnosis_code\n",
+ " ,proc.icd_code as procedure_code\n",
+ "FROM mimiciv_hosp.admissions adm\n",
+ "LEFT JOIN mimiciv_hosp.patients pat\n",
+ "ON pat.subject_id = adm.subject_id\n",
+ "LEFT JOIN mimiciv_hosp.diagnoses_icd icd\n",
+ "ON adm.subject_id = icd.subject_id\n",
+ "AND adm.hadm_id = icd.hadm_id\n",
+ "LEFT JOIN mimiciv_hosp.procedures_icd proc\n",
+ "ON adm.subject_id = proc.subject_id\n",
+ "AND adm.hadm_id = proc.hadm_id\n",
+ "LEFT JOIN mimiciv_cxr.study_list study\n",
+ "ON adm.subject_id =study.subject_id\n",
+ "WHERE study.study_id in(55199984,58487107,50127595,53092856,56675999,50331333,58713162,54624197,55037150,56783987,57734186,51764355,55223142,58846671,\n",
+ "51274834,57047258,56875381,51658914,51800155,51750028,53717084,59019496,59536212,50393027,55239920,55263578,56074305,50077246,52592881,53301121,54924087,\n",
+ "55068499,54726934,51737379,58221226,55616331,53180166,59376223,53396044,58664976,53106744,56018087,59978743,54837632,58547191,51509541,58831216,59141448,\n",
+ "55658939,54588572,57483156,53317659,56888594,53438164,50703768,59292343,50896309,51977596,55388853,51014962,50785186,59355587,57119564,59771012,53184881,\n",
+ "56463743,51268111,52269885,53127212,53528101,55376584,56064916,56626758,57244947,58152399,58562238,56867934)\n",
+ "\"\"\"\n",
+ "import_run_params = SQLQueryImportInput(\n",
+ " session = session,\n",
+ " project = project_uid, # The project/workgroup will be used to validate permissions (including and k_anonymization value)\n",
+ " workgroup = workgroup_uid,\n",
+ " connection_details = connection_details,\n",
+ " dataset_name = 'mimic_ehr_demo_hco',\n",
+ " data_schema_uid = None, # Auto-Generating the Output Data Schema for the Dataset\n",
+ " timeout_seconds = 1200,\n",
+ " is_data_deidentified = True,\n",
+ " sql_query = query_demo\n",
+ ")\n",
+ "\n",
+ "response = session.sql_query.import_dataset_from_sql_query(import_run_params)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d80ae68c",
+ "metadata": {},
+ "source": [
+ "#### Table 2: EHR Observations\n",
+ "Our second query will retrieve observations from our clinical information system, including patient BMI, height, weight, and diastolic and systolic blood pressure."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "21d3f4eb-3ef9-4302-8bf3-34c5f7ece97a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Waiting for SQL query to complete (0 hours 0 minutes and a second)\n",
+ "Waiting for SQL query to complete (0 hours 0 minutes and 12 seconds)\n",
+ "Run finished successfully\n"
+ ]
+ }
+ ],
+ "source": [
+ "query_obs = \"\"\"\n",
+ "SELECT\n",
+ " omr.subject_id,\n",
+ " omr.chartdate,\n",
+ " omr.result_name,\n",
+ " max(omr.result_value) as result\n",
+ "FROM mimiciv_hosp.omr omr\n",
+ "LEFT JOIN mimiciv_cxr.study_list study\n",
+ "ON omr.subject_id =study.subject_id\n",
+ "WHERE study.study_id in (55199984,58487107,50127595,53092856,56675999,50331333,58713162,54624197,55037150,56783987,57734186,51764355,55223142,58846671,\n",
+ "51274834,57047258,56875381,51658914,51800155,51750028,53717084,59019496,59536212,50393027,55239920,55263578,56074305,50077246,52592881,53301121,54924087,\n",
+ "55068499,54726934,51737379,58221226,55616331,53180166,59376223,53396044,58664976,53106744,56018087,59978743,54837632,58547191,51509541,58831216,59141448,\n",
+ "55658939,54588572,57483156,53317659,56888594,53438164,50703768,59292343,50896309,51977596,55388853,51014962,50785186,59355587,57119564,59771012,53184881,\n",
+ "56463743,51268111,52269885,53127212,53528101,55376584,56064916,56626758,57244947,58152399,58562238,56867934)\n",
+ "GROUP BY omr.subject_id, omr.chartdate, omr.result_name\n",
+ "\"\"\"\n",
+ "\n",
+ "import_run_params = SQLQueryImportInput(\n",
+ " session = session,\n",
+ " project = project_uid, # The project/workgroup will be used to validate permissions (including and k_anonymization value)\n",
+ " workgroup = workgroup_uid,\n",
+ " connection_details = connection_details,\n",
+ " dataset_name = 'mimic_ehr_obs_hco',\n",
+ " data_schema_uid = None, # Auto-Generating the Output Data Schema for the Dataset\n",
+ " timeout_seconds = 1200,\n",
+ " is_data_deidentified = True,\n",
+ " sql_query = query_obs\n",
+ ")\n",
+ "\n",
+ "response = session.sql_query.import_dataset_from_sql_query(import_run_params)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "98e754e6",
+ "metadata": {},
+ "source": [
+ "### Images: Importing chest x-rays from a PACS system into my Rhino client\n",
+ "Next, we'll import chest x-rays into our project so that we can conduct a computer vision experiment. \n",
+ "\n",
+ "**To enable a friction-less guided sandbox experience, Rhino staff have uploaded DICOM data into the project for you.** If you are interested in learning more about how data can be imported from your local computing environment into the Rhino Federated Computing Platform, please refer to this section of our documentation here.\n",
+ "\n",
+ "The data has been loaded in the `/rhino_data/dicom` path in the Rhino client. In addition, a file that provides metadata to associate the DICOM studies with the EHR data has been imported ('/rhino_data/hco_dataset.csv')."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "5f652fdc",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Created new Dataset 'mimic_cxr_hco' with uid '5a0230a5-43e0-47c0-9495-8423c3c757fd'\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Replace with file locations if needed\n",
+ "\n",
+ "dicom_path = \"/rhino_data/dicom\"\n",
+ "metadata_file = \"/rhino_data/hco_dataset.csv\"\n",
+ "\n",
+ "dataset_creation_params = DatasetCreateInput(\n",
+ " name=\"mimic_cxr_hco\",\n",
+ " description=\"mimic_cxr_hco\",\n",
+ " project_uid=project_uid, \n",
+ " workgroup_uid=workgroup_uid,\n",
+ " data_schema_uid = None,\n",
+ " image_filesystem_location=dicom_path,\n",
+ " csv_filesystem_location = metadata_file,\n",
+ " is_data_deidentified=True,\n",
+ " method=\"filesystem\",\n",
+ ")\n",
+ "\n",
+ "hco_image_dataset = session.dataset.add_dataset(dataset_creation_params)\n",
+ "print(f\"Created new Dataset '{hco_image_dataset.name}' with uid '{hco_image_dataset.uid}'\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1a47f554",
+ "metadata": {},
+ "source": [
+ "### What you'll see in the Rhino UI:\n",
+ "Once all three queries have been executed, you should see three Datasets in the user interface:\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ce69a362",
+ "metadata": {},
+ "source": [
+ "### Where is my data in the Rhino client? \n",
+ "Once data is uploaded, it'll reside in your designated Rhino client. While the Rhino Federated Computing Platform eliminates the need for the user to know the path of the data (enabling users just to refer to 'Datasets' it'll reside in the `/rhino_data/image/dicom` folder. \n",
+ "\n",
+ "\n",
+ "To learn more about working with DICOM data on the Rhino Federated Computing Platform, please refer to our documentation here."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/sandbox/pneumonia-prediction/notebooks/federated learning/4_model_training.ipynb b/sandbox/pneumonia-prediction/notebooks/federated learning/4_model_training.ipynb
new file mode 100644
index 0000000..e5303be
--- /dev/null
+++ b/sandbox/pneumonia-prediction/notebooks/federated learning/4_model_training.ipynb
@@ -0,0 +1,227 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "7721728c-f316-4e68-a719-7cd825426fac",
+ "metadata": {},
+ "source": [
+ "# Notebook #4: Model Training on Federated Data\n",
+ "\n",
+ "#### Import the Rhino Health Python library & Authenticate to the Rhino Cloud\n",
+ "We'll again import any necessary functions from the `rhino_health` library and authenticate to the Rhino Cloud. Please refer to Notebook #1 for an explanation of the `session` interface for interacting with various endpoints in the Rhino Health ecosystem. In addition, you can always find more information about the Rhino SDK on our Official SDK Documentation and on our PyPI Repository Page"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5f3890be-ff13-472e-901a-594fa99e9ddb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import getpass\n",
+ "import rhino_health as rh\n",
+ "from rhino_health.lib.endpoints.code_object.code_object_dataclass import (\n",
+ " CodeObjectCreateInput,\n",
+ " CodeObjectTypes,\n",
+ " CodeObjectRunInput,\n",
+ " CodeRunMultiDatasetInput,\n",
+ " ModelTrainInput \n",
+ ")\n",
+ "\n",
+ "my_username = \"FCP_LOGIN_EMAIL\" # Replace this with the email you use to log into Rhino Health\n",
+ "session = rh.login(username=my_username, password=getpass.getpass())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f1b96218-22e0-40c0-98c9-b1a9b0e3a84a",
+ "metadata": {},
+ "source": [
+ "#### Retrieve Project and Dataset Information\n",
+ "As you've surely noticed by this point, we'll start by instantiating a `Project` object. We'll continue specifying the same project name that we've been using throughout this guided sandbox experience. In addition, we'll retrieve the identifiers for the JPEG datasets that were produced in notebook #2 so that we can use them to train our AI model. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7fe89328-be4c-4b6d-8a20-d166d98b828c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "project = session.project.get_project_by_name(\"YOUR_PROJECT_NAME\") # Replace with your project name\n",
+ "\n",
+ "datasets = project.datasets\n",
+ "hco_cxr_dataset = project.get_datasets_by_name(\"mimic_cxr_hco_conv\")\n",
+ "aidev_cxr_dataset = project.get_datasets_by_name(\"mimic_cxr_dev_conv\")\n",
+ "cxr_datasets = [aidev_cxr_dataset.uid, hco_cxr_dataset.uid]\n",
+ "print(f\"Loaded CXR Datasets '{hco_cxr_dataset.uid}', '{aidev_cxr_dataset.uid}'\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b01cdbbf-ef6a-4260-80ea-2cf93eedccc6",
+ "metadata": {},
+ "source": [
+ "#### Create a Code Object to generate distinct training and testing datasets\n",
+ "When training any machine learning algorithm in a supervised fashion, we need to 'hold out' a segment of the data so that we can then use that held-out segment to generate an unbiased estimate of model performance. We can accomplish this using another container image that executes Python code to generate both a training set and a testing set. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "dd94c56e-a079-42a4-aff2-29617489539c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# get the container image that'll create the test-train split\n",
+ "train_split_image_uri = \"MY CONTAINER IMAGE URI\"\n",
+ "\n",
+ "# get the schema that was created after JPG conversion\n",
+ "cxr_schema = project.get_data_schema_by_name('mimic_cxr_hco_conv', project_uid=project.uid)\n",
+ "cxr_schema_uid =cxr_schema.uid\n",
+ "\n",
+ "# create a code object using the container image\n",
+ "test_train_split = CodeObjectCreateInput(\n",
+ " name=\"Train Test Split\",\n",
+ " description=\"Splitting data into train and test datasets per site\",\n",
+ " input_data_schema_uids=[cxr_schema_uid],\n",
+ " output_data_schema_uids=[None], # Auto-Generating the Output Data Schema for the Code Object\n",
+ " code_type=CodeTypes.GENERALIZED_COMPUTE,\n",
+ " project_uid = project.uid,\n",
+ " config={\"container_image_uri\": train_split_image_uri}\n",
+ ")\n",
+ "test_train_compute = session.code_object.create_code_object(test_train_split)\n",
+ "print(f\"Got Code Object named '{test_train_compute.name}' with uid {test_train_compute.uid}\")\n",
+ "\n",
+ "# run the code object to create new datasets at each site\n",
+ "run_params = CodeRunMultiDatasetInput(\n",
+ " code_object_uid= test_train_compute.uid,\n",
+ " input_dataset_uids=[aidev_cxr_dataset.uid, hco_cxr_dataset.uid],\n",
+ " output_dataset_naming_templates= ['{{ input_dataset_names.0 }} - Train', '{{ input_dataset_names.0 }} - Test'],\n",
+ " timeout_seconds=600,\n",
+ " sync=False,\n",
+ ")\n",
+ "\n",
+ "print(f\"Starting to run {test_train_compute.name}\")\n",
+ "code_run = session.code.run_code_object(run_params)\n",
+ "run_result = code_run.wait_for_completion()\n",
+ "print(f\"Finished running {test_train_compute.name}\")\n",
+ "print(f\"Result status is '{run_result.status.value}', errors={run_result.result_info.get('errors') if run_result.result_info else None}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6252401a-b01a-4a31-aa9c-23a835f25dc7",
+ "metadata": {},
+ "source": [
+ "#### Use NVIDIA's FLARE framework to federate model training \n",
+ "Rhino's platform includes a seamless integration of NVIDIA's Federated Learning framework (NVFlare), enabling you to train machine learning models collaboratively across distributed health data sources. This framework offers a few key advantages:\n",
+ "\n",
+ "1. **Secure Distributed Training**: NVFlare empowers users to conduct Federated Training across a network of healthcare institutions, each contributing their data insights without sharing raw data. This distributed approach ensures that sensitive patient information remains secure behind institutional firewalls.\n",
+ "2. **NVIDIA GPU Acceleration**: NVFlare taps into the computational prowess of NVIDIA GPUs, expediting model training and optimization. This acceleration is a game-changer, reducing training time and enhancing the accuracy of models trained on massive healthcare datasets.\n",
+ "3. **Versatility Across ML Frameworks**: NVFlare's framework compatibility extends to major machine learning frameworks such as PyTorch and TensorFlow. Adapt your existing machine learning code to NVFlare, ensuring seamless integration into the Federated Learning ecosystem.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8cbc18df-73a6-4009-837b-217b5a7f4955",
+ "metadata": {},
+ "source": [
+ "#### Create a code object for model training\n",
+ "In the function call below, we must only pass the input and output data schemas. This is because we are only *defining* the code object in the below cell. We musn't pass the actual datasets until we execute the code object. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b1efeda7-cb4c-446f-ac95-19c5de6ed5fc",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# path for container image\n",
+ "model_train_image_uri = \"YOUR CONTAINER URI\"\n",
+ "\n",
+ "# create code object to train the model using our container image\n",
+ "flare_model = CodeObjectCreateInput(\n",
+ " name=\"Pneumonia Prediction Model Training\",\n",
+ " description=\"Pneumonia Prediction Model Training\",\n",
+ " input_data_schema_uids=[cxr_schema_uid],\n",
+ " output_data_schema_uids=[None], # Auto-Generating the Output Data Schema for the Code Object\n",
+ " project_uid= project.uid,\n",
+ " model_type=CodeTypes.NVIDIA_FLARE_V2_2,\n",
+ " config={\"container_image_uri\": model_train_image_uri}\n",
+ ")\n",
+ "\n",
+ "flare_model = session.code_object.create_code_object(flare_model)\n",
+ "print(f\"Got FLARE model '{flare_model.name}' with uid {flare_model.uid}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5d1b79e0-84bb-42b5-9729-1adc3d124ea8",
+ "metadata": {},
+ "source": [
+ "#### Run the model training code object\n",
+ "When it comes time to actually execute our model training process, we can pass the code object's unique identifier to the function that executes the container image. We'll pass both the training and testing data to the function. Note that the `config_*` and `secrets_*` arguments can be left blank because we are required to pass neither a configuration for the federated server nor the federated configuration file associated with all NVFlare implementations. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c92c2b1d-e98d-4b48-8c2c-07852b5e1c6d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# retrieve training Dataset\n",
+ "input_training_datasets = session.dataset.search_for_datasets_by_name('Train')\n",
+ "print(['Training Datasets: ' + x.name for x in input_training_datasets])\n",
+ "\n",
+ "# retrieve testing Dataset\n",
+ "input_validation_datasets = session.dataset.search_for_datasets_by_name('Test')\n",
+ "print(['Testing Datasets: ' + x.name for x in input_validation_datasets])\n",
+ "\n",
+ "run_params = ModelTrainInput(\n",
+ " code_object_uid=flare_model.uid,\n",
+ " input_dataset_uids=[x.uid for x in input_training_datasets], \n",
+ " simulate_federated_learning=True , \n",
+ " validation_dataset_uids=[x.uid for x in input_validation_datasets], \n",
+ " validation_datasets_inference_suffix=\" - Pneumonia training results\",\n",
+ " timeout_seconds=600,\n",
+ " config_fed_server=\"\",\n",
+ " config_fed_client=\"\",\n",
+ " secrets_fed_client=\"\",\n",
+ " secrets_fed_server=\"\",\n",
+ " sync=False,\n",
+ ")\n",
+ "\n",
+ "print(f\"Starting to run federated training of {flare_model.name}\")\n",
+ "model_train = session.code_object.train_model(run_params)\n",
+ "train_result = model_train.wait_for_completion()\n",
+ "print(f\"Finished running {flare_model.name}\")\n",
+ "print(f\"Result status is '{train_result.status.value}', errors={train_result.result_info.get('errors') if train_result.result_info else None}\")"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/sandbox/pneumonia-prediction/5_model_evaluation.ipynb b/sandbox/pneumonia-prediction/notebooks/federated learning/5_model_evaluation.ipynb
similarity index 54%
rename from sandbox/pneumonia-prediction/5_model_evaluation.ipynb
rename to sandbox/pneumonia-prediction/notebooks/federated learning/5_model_evaluation.ipynb
index f15f17f..68aea5c 100644
--- a/sandbox/pneumonia-prediction/5_model_evaluation.ipynb
+++ b/sandbox/pneumonia-prediction/notebooks/federated learning/5_model_evaluation.ipynb
@@ -1,19 +1,14 @@
{
"cells": [
- {
- "cell_type": "markdown",
- "id": "041a98c7-316d-4a31-9641-14fff4aaf1de",
- "metadata": {},
- "source": [
- "# Notebook #5: Federated Evaluations"
- ]
- },
{
"cell_type": "markdown",
"id": "e1711135-c0c6-47cd-9511-76ee8bdf35f2",
"metadata": {},
"source": [
- "### Install the Rhino Health Python SDK, Load All Necessary Libraries and Login to the Rhino FCP"
+ "# Notebook #4: Model Training on Federated Data\n",
+ "\n",
+ "#### Import the Rhino Health Python library & Authenticate to the Rhino Cloud\n",
+ "We'll again import any necessary functions from the `rhino_health` library and authenticate to the Rhino Cloud. Please refer to Notebook #1 for an explanation of the `session` interface for interacting with various endpoints in the Rhino Health ecosystem. In addition, you can always find more information about the Rhino SDK on our Official SDK Documentation and on our PyPI Repository Page"
]
},
{
@@ -35,18 +30,9 @@
"import json\n",
"import io\n",
"import base64\n",
- "\n",
"import rhino_health as rh\n",
- "from rhino_health.lib.metrics import RocAuc, RocAucWithCI"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "0b015539-3224-40dc-9b22-2d7d8caa2fd2",
- "metadata": {},
- "outputs": [],
- "source": [
+ "from rhino_health.lib.metrics import RocAuc, RocAucWithCI\n",
+ "\n",
"my_username = \"FCP_LOGIN_EMAIL\" # Replace this with the email you use to log into Rhino Health\n",
"session = rh.login(username=my_username, password=getpass.getpass())"
]
@@ -56,7 +42,8 @@
"id": "f455e007-81e4-4421-ae3d-126d147fca75",
"metadata": {},
"source": [
- "### Load the Results Cohorts from the Pneumonia Training & Validation Process"
+ "#### Load the Evaluation Results Generated in Notebook #4\n",
+ "In the previous notebook, we passed a string to the `validation_dataset_inference_suffix` argument. This had the effect of assigning a name to the dataset that contains the results of our model. We'll retrieve that dataset now so that we can use the data to examine the results of our model validation. "
]
},
{
@@ -67,8 +54,17 @@
"outputs": [],
"source": [
"project = session.project.get_project_by_name(\"YOUR_PROJECT_NAME\") # Replace with your project name\n",
- "results_cohorts = session.cohort.search_for_cohorts_by_name('COHORT_SUFFIX') # Change it with your suffix\n",
- "[cohort.name for cohort in results_cohorts]"
+ "results_datasets = session.dataset.search_for_datasets_by_name('DATASET_SUFFIX') # Change it with your suffix\n",
+ "[dataset.name for dataset in results_datasetes]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "31e2242f-4cbd-4361-a676-d2278246c839",
+ "metadata": {},
+ "source": [
+ "This **Code Run** object encapsulates vital information pertaining to our specific model run within the Rhino Health FCP. Code Runs serve as the cornerstone of informed data analysis within Rhino Health FCP. By encapsulating run configurations, runtime insights, logs, and reporting capabilities, the Code Run object empowers you to derive meaningful insights, troubleshoot effectively, and collaborate seamlessly with others. It's your gateway to unlocking the potential of your model executions.\n",
+ "\n"
]
},
{
@@ -76,7 +72,8 @@
"id": "c9d92da8-5bec-4b4d-97bf-75596e71940e",
"metadata": {},
"source": [
- "### Calculate ROC (Underlying Results Data Stays On-prem)"
+ "#### Generate a Receiver Operating Characteristic (ROC) curve\n",
+ "An **ROC curve** (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. An ROC curve plots true positive ratio vs. false positive ratio at different classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives. "
]
},
{
@@ -88,7 +85,7 @@
"source": [
"# function to plot ROC\n",
"\n",
- "def plot_roc(results,cohorts):\n",
+ "def plot_roc(results,datasets):\n",
" colors = plt.rcParams['axes.prop_cycle'].by_key()['color']\n",
" linestyle_cycle = ['-', '--']\n",
" fig, ax = plt.subplots(figsize=[6, 4], dpi=200)\n",
@@ -98,7 +95,7 @@
" roc_metrics = result.output\n",
" color = colors[0]\n",
" ax.plot(roc_metrics['fpr'], roc_metrics['tpr'], color=colors[i], \n",
- " linestyle=linestyle, label=cohorts[i])\n",
+ " linestyle=linestyle, label=datasets[i])\n",
" ax.legend(loc='lower right')\n",
"\n",
" ax.title.set_text('ROC per Site')\n",
@@ -119,17 +116,17 @@
"outputs": [],
"source": [
"results = []\n",
- "cohorts = []\n",
+ "datasets = []\n",
"report_data = []\n",
"report_data.append({\"type\": \"Title\", \"data\": \"ROC Analysis\"})\n",
"\n",
- "for result in results_cohorts:\n",
- " cohort = session.cohort.get_cohort(result.uid)\n",
- " cohorts.append(cohort.name.split('-')[0])\n",
+ "for result in results_datasets:\n",
+ " dataset = session.dataset.get_dataset(result.uid)\n",
+ " datasets.append(dataset.name.split('-')[0])\n",
" metric_configuration = RocAuc(y_true_variable=\"Pneumonia\",\n",
" y_pred_variable=\"Model_Score\")\n",
- " results.append(cohort.get_metric(metric_configuration))\n",
- "fig = plot_roc(results, cohorts)\n",
+ " results.append(dataset.get_metric(metric_configuration))\n",
+ "fig = plot_roc(results, datasets)\n",
"image_to_store = Image.frombytes('RGB', fig.canvas.get_width_height(), fig.canvas.tostring_rgb())\n",
"image_to_store.save(\"ROC_per_site.png\", format='png', optimize=True, quality=100)\n",
"\n",
@@ -150,19 +147,14 @@
{
"cell_type": "markdown",
"id": "b45b0e2e-7e55-4c53-b471-8b9c16a9d5d7",
- "metadata": {},
- "source": [
- "### Upload the visualizations to the Rhino Health Platform"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "7a6889e5-2c44-454d-9b43-7facb8508189",
- "metadata": {},
- "outputs": [],
+ "metadata": {
+ "jp-MarkdownHeadingCollapsed": true
+ },
"source": [
- "model_result_uid = \"XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX\" # Paste the UID of the model results object for your NVF"
+ "### Upload the visualizations to the Rhino Health Platform\n",
+ "Users have the flexibility to generate reports related to the code run and make them accessible via the Code Run object. This feature aids in sharing insights and outcomes with collaborators or stakeholders.\n",
+ "\n",
+ "In the below code block we'll upload our ROC curve visualization to the cloud so that it can be viewed by our collaborators. "
]
},
{
@@ -172,11 +164,8 @@
"metadata": {},
"outputs": [],
"source": [
- "print(\"Sending visualizations to the Cloud\")\n",
- "\n",
- "result = session.post(f\"federatedmodelactions/{model_result_uid}/set_report/\", \n",
- " data={\"report_data\": json.dumps(report_data)})\n",
- "print('Done')"
+ "code_run_uid = \"XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX\" # Paste the UID of the Code Run object for your NVF\n",
+ "result = session.post(f\"code_runs/{code_run_uid}/set_report/\", data={\"report_data\": json.dumps(report_data)})"
]
}
],
@@ -195,7 +184,8 @@
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
- "pygments_lexer": "ipython3"
+ "pygments_lexer": "ipython3",
+ "version": "3.11.4"
}
},
"nbformat": 4,
diff --git a/sandbox/pneumonia-prediction/3_statistical_analyses.ipynb b/sandbox/pneumonia-prediction/notebooks/statistics/3_statistical_analyses.ipynb
similarity index 82%
rename from sandbox/pneumonia-prediction/3_statistical_analyses.ipynb
rename to sandbox/pneumonia-prediction/notebooks/statistics/3_statistical_analyses.ipynb
index 0edf8d7..7c82acd 100644
--- a/sandbox/pneumonia-prediction/3_statistical_analyses.ipynb
+++ b/sandbox/pneumonia-prediction/notebooks/statistics/3_statistical_analyses.ipynb
@@ -54,8 +54,8 @@
"id": "f11ba5c0",
"metadata": {},
"source": [
- "### Load the Cohorts\n",
- "We'll use our SDK to identify the relevant cohorts that we'd like to perform exploratory analyses on. It is **critical to understand that the cohorts must have the same data schema in order to generate statistics on multiple cohorts simultaneously.**\n",
+ "### Load the Datasets\n",
+ "We'll use our SDK to identify the relevant Datasets that we'd like to perform exploratory analyses on. It is **critical to understand that the Datasets must have the same Data Schema in order to generate statistics on multiple Datasets simultaneously.**\n",
"\n",
""
]
@@ -67,12 +67,12 @@
"metadata": {},
"outputs": [],
"source": [
- "#Replace with your project and cohort names. Raw data and harmonized data\n",
+ "#Replace with your Project and Dataset names. Raw data and harmonized data\n",
"project = session.project.get_project_by_name(\"YOUR_PROJECT_NAME\")\n",
"\n",
- "cxr_cohorts = [\n",
- " project.get_cohort_by_name(\"mimic_cxr_dev\"), # Replace Me\n",
- " project.get_cohort_by_name(\"mimic_cxr_hco\"), # Replace Me\n",
+ "cxr_datasets = [\n",
+ " project.get_dataset_by_name(\"mimic_cxr_dev\"), # Replace Me\n",
+ " project.get_dataset_by_name(\"mimic_cxr_hco\"), # Replace Me\n",
"] "
]
},
@@ -81,23 +81,11 @@
"id": "35a679bf",
"metadata": {},
"source": [
- "### An Introduction to Federated Metrics\n",
+ "### An Introduction to Federated Metrics in the Rhino SDK. \n",
"\n",
- "The Rhino Federated Computing Platform allows you to quickly and securely calculate metrics using federated computing across multiple sites. Each metric on the Rhino platform has two components:\n",
- "\n",
- "#### The Metric Object\n",
- "\n",
- "The metric configuration object is a crucial component, serving as a blueprint for metric retrieval. It allows you to specify the metric variables, grouping preferences, and data filters. For example, let's define two metrics:\n",
- "\n",
- "1. Count of total cases across both cohorts\n",
- "2. Count of positive pneumonia cases across both cohorts\n",
- "\n",
- "#### The Response Object\n",
- "\n",
- "When retrieving a metric, *all results are returned in a MetricResponse object*. The MetricResponse object is a Python class that includes the specific outcome values in the 'output' attribute, such as statistical measures, and details about the metric configuration ('metric_configuration_dict').\n",
- "\n",
- "The metric results will always be under the output attribute, under the metric name key (in this case, \"chi_square\"). The metric response values are then stored under the value name (e.g., \"p_value\" in the example above). The initial metric configuration used to generate this output can be found under the \"metric_configuration_dict\" attribute.\n",
+ "The Rhino Federated Computing Platform allows you to quickly and securely calculate metrics using federated computing across multiple sites. When creating a **Metric** object is necessary to specify a 1-tailed t-test, chi-square test, or arbitrary correlation statistic. Additionally, you can **configure** your Metric object with helper functions like grouping preferences and data filters.\n",
"\n",
+ "The Rhino SDK returns the results of an executed Federated Metric query in a **MetricResponse** object. The MetricResponse object is a Python class that includes the specific outcome values in the 'output' attribute, such as statistical measures, and details about the metric configuration ('metric_configuration_dict').\n",
"\n",
"### Exploratory Data Analysis\n",
"\n",
@@ -105,7 +93,7 @@
"\n",
"### Defining a simple metric without a filter:\n",
"\n",
- "We'll define the simplest metric possible - a simple count of the number of rows across both of our cohorts: "
+ "We'll define the simplest metric possible - a simple count of the number of rows across both of our Datasets: "
]
},
{
@@ -116,8 +104,8 @@
"outputs": [],
"source": [
"# Count the number of entries in the dataset\n",
- "pneumonia_count_response = session.project.aggregate_cohort_metric( \n",
- " cohort_uids=[str(cohort.uid) for cohort in cxr_cohorts], # list containing relevant cohorts\n",
+ "pneumonia_count_response = session.project.aggregate_dataset_metric( \n",
+ " dataset_uids=[str(dataset.uid) for dataset in cxr_datasets], # list containing relevant Datasets\n",
" metric_configuration=Count(variable=\"Pneumonia\") # Metric configuration\n",
") \n",
"\n",
@@ -146,8 +134,8 @@
" \"filter_column\": \"Pneumonia\", \n",
" \"filter_value\": 1})\n",
"\n",
- "pneumonia_count_response = session.project.aggregate_cohort_metric(\n",
- " cohort_uids=[str(cohort.uid) for cohort in cxr_cohorts],\n",
+ "pneumonia_count_response = session.project.aggregate_dataset_metric(\n",
+ " dataset_uids=[str(dataset.uid) for dataset in cxr_datasets],\n",
" metric_configuration=pneumonia_count_configuration)\n",
" \n",
"pneumonia_count = pneumonia_count_response.output\n",
@@ -161,7 +149,7 @@
"metadata": {},
"source": [
"#### Adding a grouping mechanism to our metric\n",
- "In addition to the `data_filter parameter`, we can also add a `group_by` parameter allows you to organize metrics based on specific categorical variables. In this example, we'll calculate the mean age across our two cohorts using the gender column in our data."
+ "In addition to the `data_filter parameter`, we can also add a `group_by` parameter allows you to organize metrics based on specific categorical variables. In this example, we'll calculate the mean age across our two Datasets using the gender column in our data."
]
},
{
@@ -171,9 +159,9 @@
"metadata": {},
"outputs": [],
"source": [
- "# Get median age of the aggregated cohort\n",
- "median_age_response = session.project.aggregate_cohort_metric(\n",
- " cohort_uids=[str(cohort.uid) for cohort in cohorts],\n",
+ "# Get median age of the aggregated Datasets\n",
+ "median_age_response = session.project.aggregate_dataset_metric(\n",
+ " dataset_uids=[str(dataset.uid) for dataset in cxr_datasets],\n",
" metric_configuration=Median(variable=\"age\",\n",
" group_by=\"gender\"),\n",
")\n",
@@ -213,8 +201,7 @@
")\n",
"\n",
"\n",
- "table_result = project.aggregate_cohort_metric([str(cohort.uid) for cohort in cohorts], # cohort uids\n",
- " tbtt).output # metric configuration\n",
+ "table_result = project.aggregate_dataset_metric([str(dataset.uid) for dataset in cxr_datasets], tbtt).output\n",
"pd.DataFrame(table_result.as_table())"
]
},
@@ -223,7 +210,7 @@
"id": "7bbc7a1a-59be-44d4-be99-3c0c1e2b91cc",
"metadata": {},
"source": [
- "Interestingly, we can see that our cohort is extremely skewed towards women with pneumonia. \n",
+ "Interestingly, we can see that our Dataset is extremely skewed towards women with pneumonia. \n",
"\n",
"#### Odds Ratio:\n",
"We can configure an odds ratio metric using the same configuration and execution pattern that we defined above for the median statistic. The Odds metric calculates the odds of an event occurring rather than not occuring, and can be generated like so: "
@@ -242,8 +229,7 @@
" detected_column_name=\"pneumonia\",\n",
")\n",
"\n",
- "session.project.aggregate_cohort_metric([str(cohort.uid) for cohort in cohorts], # cohort_uids\n",
- " odds_ratio_config).output # metric configuration"
+ "session.project.aggregate_dataset_metric([str(dataset.uid) for dataset in cxr_datasets], odds_ratio_config).output"
]
},
{
@@ -270,7 +256,7 @@
" detected_column_name=\"Pneumonia\", \n",
" start_time=\"2023-02-02T07:07:48Z\", \n",
" end_time=\"2023-06-10T11:24:43Z\", ) \n",
- "prevalence_results = session.project.aggregate_cohort_metric(cohort_uids, prevalence_config)\n",
+ "prevalence_results = session.project.aggregate_dataset_metric(cxr_datasets, prevalence_config)\n",
"\n",
"\n",
"incidence_config = Incidence( variable=\"id\", \n",
@@ -278,7 +264,7 @@
" detected_column_name=\"Pneumonia\", \n",
" start_time=\"2023-02-02T07:07:48Z\", \n",
" end_time=\"2023-06-10T11:24:43Z\", ) \n",
- "incidence_results = session.project.aggregate_cohort_metric(cohort_uids, incidence_config)"
+ "incidence_results = session.project.aggregate_dataset_metric(cxr_datasets, incidence_config)"
]
},
{
@@ -290,7 +276,7 @@
"\n",
"#### Chi-Square Test\n",
"\n",
- "The Chi-Square test is employed to assess the independence between two categorical variables. In this example, we examine the association between the occurrence of pneumonia and gender across different cohorts. The result includes the Chi-Square statistic, p-value, and degrees of freedom."
+ "The Chi-Square test is employed to assess the independence between two categorical variables. In this example, we examine the association between the occurrence of pneumonia and gender across different Datasets. The result includes the Chi-Square statistic, p-value, and degrees of freedom."
]
},
{
@@ -304,7 +290,7 @@
"\n",
"chi_square_config = ChiSquare(variable=\"id\", variable_1=\"Pneumonia\", variable_2=\"Gender\")\n",
"\n",
- "result = project.aggregate_cohort_metric(cohort_uids, chi_square_config)"
+ "result = project.aggregate_dataset_metric(cxr_datasets, chi_square_config)"
]
},
{
@@ -328,7 +314,7 @@
"\n",
"t_test_config = TTest(numeric_variable=\"Height\", categorical_variable=\"Gender\")\n",
"\n",
- "t_test_result = project.aggregate_cohort_metric(cohort_uids, t_test_config)"
+ "t_test_result = project.aggregate_dataset_metric(cxr_datasets, t_test_config)"
]
},
{
@@ -352,7 +338,7 @@
"\n",
"anova_config = OneWayANOVA(variable=\"id\", numeric_variable=\"Height\", categorical_variable=\"Inflammation Level\")\n",
"\n",
- "anova_result = project.aggregate_cohort_metric(cohort_uids, anova_config)"
+ "anova_result = project.aggregate_dataset_metric(cxr_datasets, anova_config)"
]
},
{
@@ -382,8 +368,8 @@
"# Create a KaplanMeier instance\n",
"metric_configuration = KaplanMeier(time_variable=time_variable, event_variable=event_variable)\n",
"\n",
- "# Retrieve results for your project and cohorts\n",
- "results = project.aggregate_cohort_metric(cohort_uids=[str(cohort.uid) for cohort in km_cohorts], metric_configuration=metric_configuration)"
+ "# Retrieve results for your Project and Datasets\n",
+ "results = project.aggregate_dataset_metric(dataset_uids=[str(dataset.uid) for dataset in cxr_datasets], metric_configuration=metric_configuration)"
]
},
{
@@ -396,7 +382,7 @@
"\n",
"The Kaplan-Meier Metric in the Rhino Health Platform provides results that allow you to analyze time-to-event data, create survival models, and visualize Kaplan-Meier curves. \n",
"\n",
- "The results of the Kaplan-Meier Metric are stored in a KaplanMeierModelResults object with an \"output\" attribute that contains time and event vectors. Access these vectors as follows:"
+ "The results of the Kaplan-Meier Metric are stored in a KaplanMeierMetricResponse object with an \"output\" attribute that contains time and event vectors. Access these vectors as follows:"
]
},
{
@@ -532,7 +518,8 @@
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
- "pygments_lexer": "ipython3"
+ "pygments_lexer": "ipython3",
+ "version": "3.11.4"
}
},
"nbformat": 4,
|