Skip to content

cvl-umass/GTA-CLIP

Repository files navigation

Generate, Transduct, Adapt: Iterative Transduction with VLMs

This is the code-base for GTA-CLIP proposed in

Generate, Transduct, Adapt: Iterative Transduction with VLMs

Oindrila Saha, Logan Lawrence, Grant Van Horn, Subhransu Maji

ICCV'2025

method

Overview of GTA-CLIP

(a) Vision-language models (VLMs) such as CLIP enable zero-shot classification using similarity between class prompts and images.
(b) Transduction exploits the structure of entire image dataset to assign images to classes improving accuracy.
(c) Our approach, GTA-CLIP, iteratively classifies images by   (i) generating attributes based on pairwise confusions,
  (ii) performing attribute-augmented transductive inference, and
  (iii) adapting CLIP encoders using the inferred labels.
(d) Across 12 datasets we improve upon CLIP and transductive CLIP by 8.6% and 4.0% using VIT-B/32, and similarly for other encoders. Significant improvements are also reported in the few-shot setting.

Preparation

Create a conda environment with the specifications

conda create -y --name GTACLIP python=3.10.0
conda activate GTACLIP
pip3 install -r requirements.txt
export TOKENIZERS_PARALLELISM=true

Datasets

Please follow DATASETS.md to install the datasets. For CUB dataset, follow AdaptCLIPZS

Static LLM Attributes

Download "gpt_descriptions" from AdaptCLIPZS

Running GTA-CLIP

python run_gtaclip.py --dataset <dataset_name> --root_path </path/to/datasets/folder> --backbone <clip_backbone> --gpt_path </path/to/adaptclizs/visual/attributes --gpt_path_location </path/to/adaptclizs/location/attributes

On completion this code will print the accuracies of base CLIP, TransCLIP, and GTA-CLIP for the specified dataset. The --root_path should be assigned to the folder containing all the datasets. --backbone is the CLIP architecture eg. 'vit_b16'. The --gpt_path is the path to the folder containing GPT generated attributes for the specific dataset which can be obtained from AdaptCLIPZS. Note that only CUB and Flowers datasets have the --gpt_path_location attributes. The results should be close to this table:

results

Baselines: CLIP, TransCLIP

Todo: Code for few-shot results

Thanks to TransCLIP for releasing the code base which our code is built upon.


Acknowledgements

The research is supported in part by grant #2329927 from the National Science Foundation (USA). Our experiments were performed on the GPU cluster funded by the Mass. Technology Collaborative.

Citation

If you find our work useful, please consider citing:

@inproceedings{saha2025generate,
  title={Generate, Transduct, Adapt: Iterative Transduction with VLMs},
  author={Saha, Oindrila and Lawrence, Logan and Van Horn, Grant and Maji, Subhransu},
  booktitle={International Conference on Computer Vision (ICCV)},
  year={2025}
}

About

Code for GTA-CLIP @ ICCV 2025

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages