Guidance Request: Ideal Way to Adapt MedGemma for Multi-View Medical Images (5-View Case)

### Context
We are using **MedGemma 1.5** on a medical imaging task where **each sample consists of 5 correlated images** from the same patient (breast thermography: frontal, left/right oblique, left/right lateral).

Current MedGemma documentation and examples focus on **single-image inputs**, so we are seeking confirmation of the **ideal and recommended adaptation strategy** for this multi-view setting.

### Our Understanding of the Ideal Approach
For a fixed multi-view medical imaging problem (5 views per case, ~3,000 cases), the most appropriate approach appears to be:

**Late Fusion (Feature-Level Fusion)**  
- Encode each view independently using the MedGemma (or MedSigLIP) image encoder with shared weights  
- Fuse per-view embeddings using concatenation, attention, or a small transformer  
- Train a lightweight task-specific head on top of the fused representation  

This preserves per-view semantics, scales well, and aligns with standard practice in multi-view medical imaging literature.

### Alternatives (Less Ideal)
- **Image montage (early fusion):** simple but loses per-view structure and resolution  
- **Multi-image prompt-only fusion:** possible for exploration, but unclear whether the vision encoder is designed to jointly reason over multiple images in a single request

### Questions
- Is **feature-level late fusion** the recommended pattern for multi-view medical imaging with MedGemma?
- Can the MedGemma image encoder be reliably used as a **frozen feature extractor** for this setup?
- Are there reference examples, benchmarks, or internal guidance for multi-image medical use cases?

### Use Case Summary
- Task: Breast cancer detection from thermography  
- Input: 5 fixed views per patient  
- Dataset size: ~3,000 cases  
- Output: Binary classification + localization

Any confirmation or guidance on this would help ensure correct and safe use of MedGemma in multi-view medical workflows.

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance Request: Ideal Way to Adapt MedGemma for Multi-View Medical Images (5-View Case) #51

Context

Our Understanding of the Ideal Approach

Alternatives (Less Ideal)

Questions

Use Case Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Guidance Request: Ideal Way to Adapt MedGemma for Multi-View Medical Images (5-View Case) #51

Description

Context

Our Understanding of the Ideal Approach

Alternatives (Less Ideal)

Questions

Use Case Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions