Releases: ankilab/HANCOCK_MultimodalDataset
Primary release
Release Overview
The v1.0 release organizes the code into clear, logical folders:
Environment setup: A Conda environment.yml defines all dependencies for smooth installation and reproducibility.
Data loaders & explorers: Jupyter notebooks guide users through loading the HANCOCK dataset (demographics, pathology, blood, surgical reports, WSIs) and performing exploratory analyses using pandas and matplotlib.
Preprocessing & feature extraction: Scripts for cleaning clinical text, normalizing lab values, and extracting histopathological features from whole‐slide images via OpenSlide and custom pipelines.
Core Modules and Workflows
To facilitate rigorous machine-learning experimentation, the release includes:
Train/Test split generation using a genetic‐algorithm approach to ensure balanced cohorts across modalities.
Multimodal fusion pipelines that integrate tabular, imaging, and textual features into unified PyTorch datasets and DataLoaders.
Model training & evaluation notebooks showcasing baseline classifiers (e.g., random forests, XGBoost) and deep‐learning architectures, complete with hyperparameter tuning and performance metrics (AUC, calibration curves).
Documentation & Citation
Comprehensive usage instructions, code comments, and example workflows are detailed in the README.md, with links to the public dataset portal (www.hancock.research.fau.eu)