A computational pathology toolkit for working with pathology foundation models. Supports:
- Slide preprocessing: Segment whole-slide images (WSIs) and extract tiles
- Tile embedding: Generate tile embeddings using built-in support for popular foundation models, or use your own model natively using the simple extension system.
- Text embedding: Generate language embeddings for textual prompts and visual "zeroshot embeddings" for tiles that are aligned with language embeddings for zero-shot workflows.
- Exploratory and quantitative analyses: Perform clustering and dimensionality reduction on tile embeddings from one or more slides. Run zero-shot analysis by computing similarity scores between tile vision embeddings and text embeddings of user-provided prompts using supported VLMs. Analyses expose built-in plotting utilities that generate useful figures using a single line of code.
- Command line use: Use the
pfmtcommand to run the built-in slide processing and tile embedding pipelines. Natively supports and coordinates running multiple subprocesses in parallel and utilizing multiple GPUs concurrently with a centralized command line progress tracker. Commands persist per-WSIh5files that can be reused by the Python interface. - Use as a Python package: Import the
pathfmtoolspackage in Python to perform slide preprocessing and tile embedding, run analyses, and generate visualizations.
Leveraging Foundation Models for Histological Grading in Cutaneous Squamous Cell Carcinoma using PathFMTools
Proceedings of the 5th Machine Learning for Health (ML4H) Symposium (2025)
Abdul Rahman Diab, Emily E. Karn, Renchin Wu, Emily S. Ruiz, William Lotter
arXiv link
The following models have built-in support in the repo.
| Model | Supports Text | Huggingface Repo |
|---|---|---|
| conch | ✅ | link |
| h-optimus-0 | ❌ | link |
| h-optimus-1 | ❌ | link |
| hibou-b | ❌ | link |
| hibou-l | ❌ | link |
| midnight-12k | ❌ | link |
| musk | ✅ | link |
| phikon | ❌ | link |
| phikon2 | ❌ | link |
| uni2 | ❌ | link |
| virchow | ❌ | link |
| virchow2 | ❌ | link |
To use any of these models, you must:
- Have access to it through your HuggingFace account
- Have a HuggingFace access token associated with your account
- Be locally authenticated with the HuggingFace CLI
This allows pathfmtools to pull the necessary model weights.
Authenticate with Hugging Face via CLI or environment variable:
pip install huggingface_hub # if CLI is not present
huggingface-cli login # paste your token when prompted
# Alternatively (non-interactive): export HUGGINGFACE_HUB_TOKEN=...You can easily register your own local models using the built-in extension system. Once registered, custom models can be used exactly like built-in models.
- Python 3.12+
- CUDA-capable GPU for embedding
- PyTorch with matching CUDA build installed by the provided scripts
- OpenSlide libraries available on the system
To set up the environment:
- Make sure you have uv installed.
- Run
bash create_env.sh. This will install the environment and activate it. - Run
source activate_env.shto re-activate the environment when needed. - (Optional): Run
bash download_demo_slides.shto download the 4 TCGA slides that are used in the notebooks underdemo/.
To generate embeddings, your system must have a CUDA-compatible GPU.
Minimal CLI workflow:
export DATA_DIR=/data/wsis
export STORE_DIR=/stores
# Inspect available backbones and segmenters
pfmt list-tile-models
pfmt list-segmenters
# (Optional) Pre-fetch weights
pfmt download-model-weights --model conch
# Segment, tile (if needed), and embed
pfmt embed-tiles \
--slide-path "$DATA_DIR/*.tiff" \
--output-dir "$STORE_DIR" \
--gpu 0 \
--model conch \
--batch-size 256 \
--n-tile-workers 8 \
--n-data-workers 2Every dispatch run also emits slide_status.csv in the chosen --output-dir, capturing a
success/error flag and any exception message for each slide so you can spot issues without
scraping logs.
Minimal Python workflow:
from pathlib import Path
from pathfmtools.image.slide import Slide
from pathfmtools.tile_models.model_pool import ModelPool
# store_root is the path to the output directory where the h5 stores will be saved
slide = Slide(slide_path=Path("/data/slide1.tiff"), store_root_dir=Path("/stores"))
pool = ModelPool(model_names=["conch"], devices=["cuda:0"]) # Single GPU
embed_results = slide.embed_tiles(model_pool=pool, batch_size=256)
for result in embed_results:
features = result.feature_embedding_matrixdemo/analysis/clustering.ipynbdemonstrates how embeddings from one or more slides can be clustered to capture patterns.demo/analysis/dimensionality_reduction.pynbdemonstrates how embeddings can be reduced and visualized in conjunction with their associated patches.demo/analysis/zeroshot_classification.ipynbdemonstrates a minimal zero-shot classification workflow with VLMs using generated embeddings.demo/analysis/abmil.ipynbdemonstrates a simple weakly-supervised ABMIL setup for performing slide-level classification using patch-level embeddings.
CLI commands and the Python APIs persist all intermediate artifacts into a slide-specific HDF5
store (<store_root_dir>/<slide_id>.h5), where <slide_id> is the stem of the WSI file path (e.g. the store for slide_name.svs is slide_name.h5). Each store keeps:
- segmentation masks
- per-tile RGB pixels keyed by tile size
- feature and zero-shot embeddings keyed by model + scaled tile size
- manifest metadata (microns-per-pixel, slide dimensions)
Tile coordinates are stored as top-left (x, y) pixel offsets. Downstream analysis relies on these
coordinates, so never reorder tiles manually.
Slide reading is backed by OpenSlide; common formats include SVS, TIFF, NDPI, SCN. Availability varies by platform.
Example HDF5 store layout
<slide_id>.h5
├─ metadata # Serialized JSON
├─ segmentation/
│ ├─ mask
│ ├─ method
│ └─ prop_foreground
├─ tiles/<tile_size_px>/
│ ├─ pixels (N, H, W, 3)
│ └─ coords (N, 2) # x, y (top-left)
└─ embeddings/<model_name>/<model_tile_size_px>/
├─ feature # (N, D_f)
├─ feature_top_left_coords # (N, 2)
├─ zeroshot # (N, D_z)
└─ zeroshot_top_left_coords # (N, 2)
The CLI is designed for coarse-grained processing where you supply WSI paths or glob patterns. It is optimal for large runs, as it offers built-in support for multiprocessing and multi-GPU workflows. Run
pfmt --help or pfmt <command> --help for argument details.
pfmt list-tile-models— prints available embedding backbones and their capabilities, including whether required weights are already cached.pfmt list-segmenters— enumerates tissue/background segmentation backends.
Also see Extensions for registering custom models and segmenters.
pfmt download-model-weights --model conch --model musk— fetch weights eagerly if machines lack network access during runs. Missing weights can be downloaded on-demand as well.
pfmt segment-and-tile --slide-path "/data/*.tiff" --output-dir /stores --model conch- When
--modelis provided the tool infers tile sizes that satisfy each model's expected microns-per-pixel. Alternatively, specify explicit--tile-sizevalues. - Use
--segmenterto pick a custom method; defaults tootsu. - Control concurrency with
--n-parallel(slides) and--n-tile-workers(per slide).
- When
pfmt embed-tiles --slide-path /data/slide1.tiff --output-dir /stores --gpu 0 --model conch.- Reuses cached tiles when present; otherwise triggers segmentation/tiling automatically.
--tile-sizeoverrides inferred tile sizes (rarely needed).- Use
--skip-feature-embeddingsor--skip-zeroshot-embeddingsto limit outputs. - Enable
--allow-mpp-interpolationif the slide resolution is coarser than the model expects. --delete-tiles/--no-delete-tilestoggles whether cached RGB tiles remain in the store after embedding (keeping them speeds up later pixel-dependent workflows at the cost of disk space).- Adjust
--n-tile-workersand--n-data-workersfor IO vs. dataloader concurrency.
Example multi-GPU batch (see demo/embed_tiles.sh):
pfmt embed-tiles \
--slide-path "tests/data/*.tiff" \
--output-dir tests/data \
--gpu 0 --gpu 1 \
--batch-size 512 \
--model conch --model musk \
--n-tile-workers 8 \
--n-data-workers 2pfmt delete-tiles --store-path /stores/slide1.h5 --yesremoves cached tile pixels while keeping metadata and embeddings intact. Substantially reduces the amount of disk space occupied by the generated h5 files, but can be slow as the h5 files must be reconstructed to reclaim disk space.
pfmt embed-text --model conch --prompt "tumor" --prompt "stroma" --gpu cuda:0- Prompts are cached in
pathfmtools/data/text_store.h5.
- Prompts are cached in
The Python interfaces provide finer control for custom preprocessing loops and interactive analysis.
Slide is one of three objects that the user is responsible for initializing (the other two being ModelPool and TileAnalysis). It represents a WSI along with all of its processed data, including tiles, segmentation, and embeddings. It also exposes methods that run supported workflows, such as slide preprocessing (segmentation + tiling) and tile embedding. The Slide object also exposes basic visualization utilities.
from pathlib import Path
from pathfmtools.image.slide import Slide
# Initialize a Slide from a WSI. This is the only initialization mode when processing a
# slide for the first time.
slide = Slide(
slide_path=Path("/data/slide1.tiff"),
store_root_dir=Path("/stores"),
microns_per_pixel=None, # Can be specified to override (potentially missing) slide metadata
)
# Alternatively, initialize a Slide from a pre-existing h5 store. Per-slide h5 stores are
# created from pre-processing and embedding steps. h5 stores keep a reference to the original
# WSI file path, and contain cached tiles, segmentation results, and embedding results. This is
# the preferred method of initialization when available.
slide = Slide(
store_path=Path("/stores/slide1.h5"),
)Slide preprocessing (segmentation + tiling) can be performed by calling the Slide.preprocess method.
Note: When interested in generating embeddings, a direct call to Slide.preprocess is neither necessary nor recommended, as the Slide.embed_tiles method handles preprocessing implicitly and always extracts the tile size(s) that match what the specified model(s) expect — see the Embedding section.
segmentation, tile_readers = slide.preprocess(
tile_sizes=[448], # 1+ tile sizes to extract. Only foreground regions (tissue) are extracted).
segmenter="otsu", # The name of the segmenter that will distinguish tissue from background.
tile_workers=4, # Number of multiprocessing workers for extracting tiles from the WSI
)segmentationis aSegmentationobject with the attributes:seg_mask: Boolean foreground/background (1/0) array. Dimensions are determined by the segmenter of choice, but must preserve the aspect ratio of the slide.seg_method: The name of the segmentation method used to produce the segmentation.prop_foreground: The proportion of the slide that was determined to be foreground/tissue.
tile_readersare re-usable pixel iterators bound to the persisted HDF5 datasets. They support lazy access to tile pixels using numpy-like indexing syntax, and support iteration. Tile readers must be used as context managers:tile_reader = slide.get_tile_reader(tile_size=448) with tile_reader as t: tile0, coords0 = t[0] for tile, coords in t: pass
The Segmentation object associated with a slide can be accessed using Slide.get_segmentation(). Similarly, the tile reader for a given tile size can be accessed using Slide.get_tile_reader(tile_size=...).
The Slide object supports visualizations including thumbnail generation and reading of arbitrary regions given coordinates. See demo/slide.ipynb for examples.
ModelPool is an object that coordinates calls to embedding models. It allows the user to specify the model(s) and GPU(s) to use for embedding, and is a required input to the Slide.embed_tiles method.
The object supports 2 modes of GPU assignment:
- Dynamic: specify multiple models and multiple GPUs without direct linkage. The pool will attempt to keep the specified GPUs as busy as possible by dynamically assigning embedding tasks to free GPUs in parallel.
- Mapped: specify an explicit mapping from model to GPU. Models can only run on their associated GPUs. Parallel embedding is performed given at least two models with non-overlapping GPU assignments.
import torch
from pathfmtools.tile_models.model_pool import ModelPool
# Dynamic GPU assignment. Each model may run on either GPU, provided that it is free.
pool = ModelPool(
model_names=["conch", "musk", "virchow2"],
devices=["cuda:0", "cuda:1"],
)
# Mapped GPU assignment. conch and musk can only run on GPU 0, virchow2 can only run on GPU 1.
# conch and musk will run sequentially on GPU 0, virchow2 will run in parallel on GPU 1.
pool = ModelPool(
model_device_map={
"conch": "cuda:0",
"musk": "cuda:0",
"virchow2": "cuda:1",
}
)
embed_results = slide.embed_tiles(
model_pool=pool,
batch_size=256, # Batches fed into embedding models
skip_feature_embeddings=False, # If True, skip generating feature embeddings
skip_zeroshot_embeddings=False, # If True, skip generating zero-shot embeddings
tile_workers=8, # Multiprocessing workers for tile extraction
data_workers=2, # DataLoader workers feeding the model
)
# embed_results is a list of TileEmbeddingGroup objects. See demo/slide.ipynb for usage reference.
for result in embed_results:
features = result.feature_embedding_matrix # Shape (n_tiles, n_features)
zeroshot = result.zeroshot_embedding_matrix # May be None if model lacks text support
pixels = result.pixel_reader # Context-managed lazy access to RGB tilesThe TileEmbeddingGroup objects associated with a slide can be accessed using Slide.read_tile_embeddings(model_name=..., tile_size=...). Note that tile_size is the size of the tiles that was provided to the model, which is not necessarily the same as the size of the tiles extracted from the slide (due to inter-slide variations in microns per pixel). If you directly call Slide.embed_tiles as shown in the example above, the "model tile size" will always be 224 for built-in models.
The TileAnalysis object provides a simple interface for running analyses using the generated embeddings.
from pathfmtools.analysis.tile_analysis import TileAnalysis
analysis = TileAnalysis.from_slides(
data=slide, # One Slide instance or a list of Slide instances.
model_name="conch", # Name of the model that generated the embeddings which will be used.
tile_size=224, # Tile size provided to the model. Unless overridden during embedding generation, always 224.
)TileAnalysis has built-in support for 3 types of analyses:
- Clustering: Cluster model embeddings and visualize tiles assigned to different clusters to detect patterns. See
demo/analysis/clustering.ipynb. - Zero-shot classification: Compute cosine similarity scores (logits) between tiles and user-provided text prompts. Visualize the highest- and lowest-probability tiles for each prompt. Visualize prompt-specific similarity score heatmaps and tile class assignments over a slide thumbnail. See
demo/analysis/zeroshot_classification.ipynb. - Dimensionality reduction: Perform dimensionality reduction (TSNE/UMAP/PCA) on generated embeddings and visualize the results as 2D/3D scatterplots. Supports coloring of reduced data points by clustering results or zero-shot class assignment. When running in Jupyter, the generated scatterplots support on-hover display of the corresponding tile pixels to facilitate pattern detection. See
demo/analysis/dimensionality_reduction.ipynb.
Additionally, pathfmtools exposes the pathfmtools.data.torch.TileEmbeddingDataset class, which is a Torch Dataset that interfaces with the created h5 stores and exposes generated embeddings, greatly simplifying deep learning workflows. See demo/analysis/abmil.ipynb for a plug-and-play example that trains a toy ABMIL model using only a few lines of code.
The extension system lets you add user-provided embedding models and slide segmenters without
modifying core code. Extensions are regular Python classes that live under the
pathfmtools.extensions package and are registered by alias into a persistent manifest.
- Location policy: classes must be importable from
pathfmtools.extensions.<module>:<Class>. - Manifest: registrations persist to
pathfmtools/extensions/manifest.yaml. - Autoloading: on startup, registered entries are validated and added to the core registries so
you can reference them anywhere a
modelorsegmentername is accepted by the CLI or API.
Kinds and required interfaces
- Segmenters: implement a static
segment_slide(slide_reader: SlideReader, ...) -> np.ndarray[bool]. Example:pathfmtools.extensions.demo_segmenter:CheckerboardSegmenter. - Models: subclass
pathfmtools.tile_models.tile_model.TileModeland define required class attributes (NAME,EXPECTED_MICRONS_PER_PIXEL,EXPECTED_TILE_SIZE,FEATURE_EMBEDDING_DIM,SUPPORTS_ZEROSHOT,ZEROSHOT_EMBEDDING_DIM,POOLING_RULE) and methods (get_preproc_callable,get_feature_embeddings). Example:pathfmtools.extensions.demo_model:MeanIntensityModel.
See demo/register_model.ipynb and demo/register_segmenter.ipynb for examples of registering classes and using them.
- Extracted tile sizes vs model tile sizes
- Slides can have variable microns-per-pixel (MPP). Extraction happens in slide pixel space at requested sizes (e.g., 448 px). Models consume a fixed model tile size (e.g., 224) after scaling to match expected MPP. The “model tile size” recorded with embeddings reflects the input to the model, not necessarily the raw extracted size. If not explicitly overridden, all built-in models consume a tile size of 224.
Slide.list_available_feature_embeddings()andSlide.list_available_zeroshot_embeddings()list the available embeddings and the tile sizes that the models consumed to generate them.
- Slides can have variable microns-per-pixel (MPP). Extraction happens in slide pixel space at requested sizes (e.g., 448 px). Models consume a fixed model tile size (e.g., 224) after scaling to match expected MPP. The “model tile size” recorded with embeddings reflects the input to the model, not necessarily the raw extracted size. If not explicitly overridden, all built-in models consume a tile size of 224.
- Persisted artifacts
- Per-slide HDF5 stores cache segmentation masks, RGB tiles per size, and embeddings per model+model-tile-size. Repeated runs reuse cached data to avoid recomputation.
- Torch datasets
- Use
pathfmtools.data.torch.TileEmbeddingDatasetto iterate embeddings directly from HDF5. Ideal for training downstream models without manual I/O.
- Use
- Feature vs zero-shot embeddings
- Feature embeddings are pure vision features. Zero-shot embeddings are aligned to the model’s text space and enable prompt-based similarity and classification when the backbone supports text.
- Concurrency
- Tune CLI parameters
--n-parallel(slides),--n-tile-workers(tiling), and--n-data-workers(dataloader) based on IO vs GPU utilization.
- Tune CLI parameters
- Disk
- Keeping RGB tiles accelerates future pixel-dependent tasks but increases store size; remove via
pfmt delete-tileswhen not needed.
- Keeping RGB tiles accelerates future pixel-dependent tasks but increases store size; remove via
- Memory
- Batch size is bounded by GPU memory and model; start with smaller batch sizes and adjust. Monitor VRAM and dataloader RAM usage.