Skip to content

[MICCAI 2025] CardiacCLIP: Video-based CLIP Adaptation for LVEF Prediction in a Few-shot Manner

Notifications You must be signed in to change notification settings

xmed-lab/CardiacCLIP

Repository files navigation

CardiacCLIP

This repository contains PyTorch implementation of "CardiacCLIP: Video-based CLIP Adaptation for LVEF Prediction in a Few-shot Manner" (MICCAI 2025).

Created by Du Yao, Guo Jiarong, Li Xiaomeng*

Overview of CardiacCLIP

CardiacCLIP is a novel adaptation of CLIP models for few-shot echocardiogram video analysis, capturing crucial temporal dynamics and localized cardiac structures essential for accurate diagnosis.

intro

🔑 Key Idea

  • Multi-Frame Learning (MFL)
    An attention-based aggregation mechanism that prioritizes diagnostically relevant frames instead of simple averaging.

  • EchoZoom
    A multi-scale input representation strategy that enhances modeling of fine-grained cardiac structures.

The CardiacCLIP codebase is largely built upon NumCLIP, sharing a similar overall architecture. To ease the difficulty of direct regression learning (LVEF prediction), we adopt a coarse-to-fine pipeline, where a classification stage is followed by a regression refinement step. For more details on this design, please refer to our ECCV paper on NumCLIP.

Training & Evaluation

  1. Change the dataset path in /echoclip/runner/data.py (around line 330). Please download the EchoNet-Dynamic Dataset first.

  2. Run the training script:

sh scripts/run.sh

Results and logs will be saved in the results/ and wandb/ folders.

One More Thing

Our project serves as a unified codebase for fine-tuning echocardiogram foundation models — such as EchoCLIP, EchoPrime, and PanEcho — supporting both full-model tuning and parameter-efficient finetuning approaches (e.g., CoOp) in a fast and modular manner.

You can simply define and load each model by initializing the corresponding encoder and loading its pretrained weights, as shown below.

# Initialize and load EchoPrime video encoder
self.prime_encoder = models.video.mvit_v2_s()
device = torch.device("cuda")
checkpoint = torch.load("/home/ydubf/model_data/weights/echo_prime_encoder.pt", map_location=device)
self.prime_encoder.head[-1] = torch.nn.Linear(self.prime_encoder.head[-1].in_features, 512)
self.prime_encoder.load_state_dict(checkpoint)
self.prime_encoder.to(device)

# Initialize and load EchoPrime text encoder
self.prime_text_encoder = EchoPrimeTextEncoder()
checkpoint = torch.load("/home/ydubf/EchoPrime/model_data/weights/echo_prime_text_encoder.pt", map_location=device)
self.prime_text_encoder.load_state_dict(checkpoint)
self.prime_text_encoder.to(device)

Relevant Papers and Projects:

  1. EchoCLIP: Vision–language foundation model for echocardiogram interpretation
  2. EchoPrime: A Multi-Video View-Informed Vision-Language Model for Comprehensive Echocardiography Interpretation
  3. PanEcho: Complete AI-Enabled Echocardiography Interpretation With Multitask Deep Learning

Citation

If you find this repository useful, please cite our work:

@inproceedings{du2025cardiacclip,
  title={CardiacCLIP: Video-Based CLIP Adaptation for LVEF Prediction in a Few-Shot Manner},
  author={Du, Yao and Guo, Jiarong and Li, Xiaomeng},
  booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
  pages={46--56},
  year={2025},
  organization={Springer}
}
@inproceedings{du2024teach,
  title={Teach clip to develop a number sense for ordinal regression},
  author={Du, Yao and Zhai, Qiang and Dai, Weihang and Li, Xiaomeng},
  booktitle={European Conference on Computer Vision},
  pages={1--17},
  year={2024},
  organization={Springer}
}

About

[MICCAI 2025] CardiacCLIP: Video-based CLIP Adaptation for LVEF Prediction in a Few-shot Manner

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published