Skip to content

Guidance Request: Ideal Way to Adapt MedGemma for Multi-View Medical Images (5-View Case) #51

@blackpearl006

Description

@blackpearl006

Context

We are using MedGemma 1.5 on a medical imaging task where each sample consists of 5 correlated images from the same patient (breast thermography: frontal, left/right oblique, left/right lateral).

Current MedGemma documentation and examples focus on single-image inputs, so we are seeking confirmation of the ideal and recommended adaptation strategy for this multi-view setting.

Our Understanding of the Ideal Approach

For a fixed multi-view medical imaging problem (5 views per case, ~3,000 cases), the most appropriate approach appears to be:

Late Fusion (Feature-Level Fusion)

  • Encode each view independently using the MedGemma (or MedSigLIP) image encoder with shared weights
  • Fuse per-view embeddings using concatenation, attention, or a small transformer
  • Train a lightweight task-specific head on top of the fused representation

This preserves per-view semantics, scales well, and aligns with standard practice in multi-view medical imaging literature.

Alternatives (Less Ideal)

  • Image montage (early fusion): simple but loses per-view structure and resolution
  • Multi-image prompt-only fusion: possible for exploration, but unclear whether the vision encoder is designed to jointly reason over multiple images in a single request

Questions

  • Is feature-level late fusion the recommended pattern for multi-view medical imaging with MedGemma?
  • Can the MedGemma image encoder be reliably used as a frozen feature extractor for this setup?
  • Are there reference examples, benchmarks, or internal guidance for multi-image medical use cases?

Use Case Summary

  • Task: Breast cancer detection from thermography
  • Input: 5 fixed views per patient
  • Dataset size: ~3,000 cases
  • Output: Binary classification + localization

Any confirmation or guidance on this would help ensure correct and safe use of MedGemma in multi-view medical workflows.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions