Skip to content

TCIA API

andrewscouten edited this page Jan 15, 2026 · 1 revision

TCIA API (The Cancer Imaging Archive)

The TCIA module provides a Python API for programmatically downloading imaging data manifest files and imaging data from The Cancer Imaging Archive (TCIA) for TCGA cohorts.

Overview

This module provides a clean, YAML-based configuration system for downloading TCIA imaging data and manifest files for TCGA cohorts. It supports downloading .tcia manifest files and optionally running the nbia-data-retriever tool to download actual imaging data.

Module Structure

src/oncolearn/api/tcia/
├── builder.py               # Builder pattern for creating cohorts from YAML
├── tcia_dataset.py          # TCIA dataset class for manifests and imaging data
└── download.py              # Download utilities

data/tcia/configs/           # YAML configuration files
├── acc.yaml
├── blca.yaml
├── brca.yaml
└── ... (all TCGA cohorts with imaging data)

YAML Configuration Format

Each cohort is defined in a YAML file with the following structure:

cohort:
  code: BRCA
  name: TCGA-BRCA
  description: TCGA Breast Invasive Carcinoma cohort with imaging data

datasets:
  - name: BRCA Imaging Manifest
    description: TCGA breast cancer imaging manifest
    category: manifest
    url: https://www.cancerimagingarchive.net/wp-content/uploads/TCIA_TCGA-BRCA_09-16-2015.tcia
  
  # ... more datasets

API Usage

Basic Usage

from oncolearn.api.tcia import TCIACohortBuilder

# Create a builder
builder = TCIACohortBuilder()

# Build and download a cohort's manifest files
brca_cohort = builder.build_cohort("BRCA")
brca_cohort.download()  # Downloads BRCA manifest files

# Download to a specific directory
brca_cohort.download(output_dir="my_data/tcia/brca")

Download Manifest Files and Imaging Data

from oncolearn.api.tcia import TCIACohortBuilder

builder = TCIACohortBuilder()
brca_cohort = builder.build_cohort("BRCA")

# Download manifest files and imaging data
brca_cohort.download(
    output_dir="data/tcia/BRCA",
    download_images=True  # Runs nbia-data-retriever
)

List Available Cohorts

from oncolearn.api.tcia import TCIACohortBuilder

builder = TCIACohortBuilder()
cohorts = builder.list_available_cohorts()
print(cohorts)  # ['BRCA', 'LUAD', ...]

Access Individual Datasets

from oncolearn.api.tcia import TCIACohortBuilder

builder = TCIACohortBuilder()
brca_cohort = builder.build_cohort("BRCA")

# List all datasets
dataset_names = brca_cohort.list_datasets()
print(dataset_names)

# Download a specific dataset
manifest = brca_cohort.get_dataset("BRCA Imaging Manifest")
manifest.download("my_data/tcia/brca")

Filter Datasets by Category

from oncolearn.api.tcia import TCIACohortBuilder
from oncolearn.api.dataset import DataCategory

builder = TCIACohortBuilder()
brca_cohort = builder.build_cohort("BRCA")

# Get all manifest datasets
manifests = brca_cohort.get_datasets_by_category(DataCategory.MANIFEST)

# Get all imaging datasets
images = brca_cohort.get_datasets_by_category(DataCategory.IMAGE)

Data Categories

Available data categories for TCIA data:

  • manifest: TCIA manifest files (.tcia format)

    • These files contain metadata and references to imaging studies
    • Used with nbia-data-retriever to download actual DICOM images
  • image: Imaging data (DICOM format)

    • Medical imaging data (CT, MRI, PET, etc.)
    • Downloaded using nbia-data-retriever with manifest files
  • clinical: Clinical/phenotype data associated with imaging studies

  • multimodal: Combined data types

Adding New Datasets

To add a new dataset to an existing cohort:

  1. Open the cohort's YAML file (e.g., data/tcia/configs/brca.yaml)
  2. Add a new entry to the datasets list:
  - name: BRCA New Manifest
    description: Description of the new manifest
    category: manifest
    url: https://download.url/manifest.tcia
    filename: manifest.tcia
    default_subdir: TCGA-BRCA
    file_type: manifest
  1. Save the file - no Python code changes needed!

Adding New Cohorts

To add a completely new cohort:

  1. Create a new YAML file in data/tcia/configs/ (e.g., newcohort.yaml)
  2. Follow the YAML structure shown above
  3. The cohort will automatically be available via the builder