-
Notifications
You must be signed in to change notification settings - Fork 4
TCIA API
The TCIA module provides a Python API for programmatically downloading imaging data manifest files and imaging data from The Cancer Imaging Archive (TCIA) for TCGA cohorts.
This module provides a clean, YAML-based configuration system for downloading TCIA imaging data and manifest files for TCGA cohorts. It supports downloading .tcia manifest files and optionally running the nbia-data-retriever tool to download actual imaging data.
src/oncolearn/api/tcia/
├── builder.py # Builder pattern for creating cohorts from YAML
├── tcia_dataset.py # TCIA dataset class for manifests and imaging data
└── download.py # Download utilities
data/tcia/configs/ # YAML configuration files
├── acc.yaml
├── blca.yaml
├── brca.yaml
└── ... (all TCGA cohorts with imaging data)
Each cohort is defined in a YAML file with the following structure:
cohort:
code: BRCA
name: TCGA-BRCA
description: TCGA Breast Invasive Carcinoma cohort with imaging data
datasets:
- name: BRCA Imaging Manifest
description: TCGA breast cancer imaging manifest
category: manifest
url: https://www.cancerimagingarchive.net/wp-content/uploads/TCIA_TCGA-BRCA_09-16-2015.tcia
# ... more datasetsfrom oncolearn.api.tcia import TCIACohortBuilder
# Create a builder
builder = TCIACohortBuilder()
# Build and download a cohort's manifest files
brca_cohort = builder.build_cohort("BRCA")
brca_cohort.download() # Downloads BRCA manifest files
# Download to a specific directory
brca_cohort.download(output_dir="my_data/tcia/brca")from oncolearn.api.tcia import TCIACohortBuilder
builder = TCIACohortBuilder()
brca_cohort = builder.build_cohort("BRCA")
# Download manifest files and imaging data
brca_cohort.download(
output_dir="data/tcia/BRCA",
download_images=True # Runs nbia-data-retriever
)from oncolearn.api.tcia import TCIACohortBuilder
builder = TCIACohortBuilder()
cohorts = builder.list_available_cohorts()
print(cohorts) # ['BRCA', 'LUAD', ...]from oncolearn.api.tcia import TCIACohortBuilder
builder = TCIACohortBuilder()
brca_cohort = builder.build_cohort("BRCA")
# List all datasets
dataset_names = brca_cohort.list_datasets()
print(dataset_names)
# Download a specific dataset
manifest = brca_cohort.get_dataset("BRCA Imaging Manifest")
manifest.download("my_data/tcia/brca")from oncolearn.api.tcia import TCIACohortBuilder
from oncolearn.api.dataset import DataCategory
builder = TCIACohortBuilder()
brca_cohort = builder.build_cohort("BRCA")
# Get all manifest datasets
manifests = brca_cohort.get_datasets_by_category(DataCategory.MANIFEST)
# Get all imaging datasets
images = brca_cohort.get_datasets_by_category(DataCategory.IMAGE)Available data categories for TCIA data:
-
manifest: TCIA manifest files (.tcia format)- These files contain metadata and references to imaging studies
- Used with nbia-data-retriever to download actual DICOM images
-
image: Imaging data (DICOM format)- Medical imaging data (CT, MRI, PET, etc.)
- Downloaded using nbia-data-retriever with manifest files
-
clinical: Clinical/phenotype data associated with imaging studies -
multimodal: Combined data types
To add a new dataset to an existing cohort:
- Open the cohort's YAML file (e.g.,
data/tcia/configs/brca.yaml) - Add a new entry to the
datasetslist:
- name: BRCA New Manifest
description: Description of the new manifest
category: manifest
url: https://download.url/manifest.tcia
filename: manifest.tcia
default_subdir: TCGA-BRCA
file_type: manifest- Save the file - no Python code changes needed!
To add a completely new cohort:
- Create a new YAML file in
data/tcia/configs/(e.g.,newcohort.yaml) - Follow the YAML structure shown above
- The cohort will automatically be available via the builder
OncoLearn | A comprehensive toolkit for cancer genomics analysis and biomarker discovery.
Built with ❤️ for cancer research