TCIA API

TCIA API (The Cancer Imaging Archive)

The TCIA module provides a Python API for programmatically downloading imaging data manifest files and imaging data from The Cancer Imaging Archive (TCIA) for TCGA cohorts.

Overview

This module provides a clean, YAML-based configuration system for downloading TCIA imaging data and manifest files for TCGA cohorts. It supports downloading .tcia manifest files and optionally running the nbia-data-retriever tool to download actual imaging data.

Module Structure

src/oncolearn/api/tcia/
├── builder.py               # Builder pattern for creating cohorts from YAML
├── tcia_dataset.py          # TCIA dataset class for manifests and imaging data
└── download.py              # Download utilities

data/tcia/configs/           # YAML configuration files
├── acc.yaml
├── blca.yaml
├── brca.yaml
└── ... (all TCGA cohorts with imaging data)

YAML Configuration Format

Each cohort is defined in a YAML file with the following structure:

cohort:
  code: BRCA
  name: TCGA-BRCA
  description: TCGA Breast Invasive Carcinoma cohort with imaging data

datasets:
  - name: BRCA Imaging Manifest
    description: TCGA breast cancer imaging manifest
    category: manifest
    url: https://www.cancerimagingarchive.net/wp-content/uploads/TCIA_TCGA-BRCA_09-16-2015.tcia
  
  # ... more datasets

API Usage

Basic Usage

from oncolearn.api.tcia import TCIACohortBuilder

# Create a builder
builder = TCIACohortBuilder()

# Build and download a cohort's manifest files
brca_cohort = builder.build_cohort("BRCA")
brca_cohort.download()  # Downloads BRCA manifest files

# Download to a specific directory
brca_cohort.download(output_dir="my_data/tcia/brca")

Download Manifest Files and Imaging Data

from oncolearn.api.tcia import TCIACohortBuilder

builder = TCIACohortBuilder()
brca_cohort = builder.build_cohort("BRCA")

# Download manifest files and imaging data
brca_cohort.download(
    output_dir="data/tcia/BRCA",
    download_images=True  # Runs nbia-data-retriever
)

List Available Cohorts

from oncolearn.api.tcia import TCIACohortBuilder

builder = TCIACohortBuilder()
cohorts = builder.list_available_cohorts()
print(cohorts)  # ['BRCA', 'LUAD', ...]

Access Individual Datasets

from oncolearn.api.tcia import TCIACohortBuilder

builder = TCIACohortBuilder()
brca_cohort = builder.build_cohort("BRCA")

# List all datasets
dataset_names = brca_cohort.list_datasets()
print(dataset_names)

# Download a specific dataset
manifest = brca_cohort.get_dataset("BRCA Imaging Manifest")
manifest.download("my_data/tcia/brca")

Filter Datasets by Category

from oncolearn.api.tcia import TCIACohortBuilder
from oncolearn.api.dataset import DataCategory

builder = TCIACohortBuilder()
brca_cohort = builder.build_cohort("BRCA")

# Get all manifest datasets
manifests = brca_cohort.get_datasets_by_category(DataCategory.MANIFEST)

# Get all imaging datasets
images = brca_cohort.get_datasets_by_category(DataCategory.IMAGE)

Data Categories

Available data categories for TCIA data:

manifest: TCIA manifest files (.tcia format)
- These files contain metadata and references to imaging studies
- Used with nbia-data-retriever to download actual DICOM images
image: Imaging data (DICOM format)
- Medical imaging data (CT, MRI, PET, etc.)
- Downloaded using nbia-data-retriever with manifest files
clinical: Clinical/phenotype data associated with imaging studies
multimodal: Combined data types

Adding New Datasets

To add a new dataset to an existing cohort:

Open the cohort's YAML file (e.g., data/tcia/configs/brca.yaml)
Add a new entry to the datasets list:

  - name: BRCA New Manifest
    description: Description of the new manifest
    category: manifest
    url: https://download.url/manifest.tcia
    filename: manifest.tcia
    default_subdir: TCGA-BRCA
    file_type: manifest

Save the file - no Python code changes needed!

Adding New Cohorts

To add a completely new cohort:

Create a new YAML file in data/tcia/configs/ (e.g., newcohort.yaml)
Follow the YAML structure shown above
The cohort will automatically be available via the builder

OncoLearn | A comprehensive toolkit for cancer genomics analysis and biomarker discovery.

Built with ❤️ for cancer research

📚 Documentation • 🐛 Report Issues

CLI
- CLI
API
- Xena Browser API
- TCIA API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TCIA API

TCIA API (The Cancer Imaging Archive)

Overview

Module Structure

YAML Configuration Format

API Usage

Basic Usage

Download Manifest Files and Imaging Data

List Available Cohorts

Access Individual Datasets

Filter Datasets by Category

Data Categories

Adding New Datasets

Adding New Cohorts

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally