Skip to content

Vatsal-Malaviya/AcT2I

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AcT2I

Project Page arXiv: 2509.16141 Hugging Face: Datasets

Official code release for "AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models".

Installation

pip install -e .            # core dependencies
pip install -e ".[analysis]" # + spacy, matplotlib, seaborn, plotly
pip install -e ".[all]"      # everything including dev tools

Overview

The act2i package provides:

  • Prompt Enhancement (act2i.prompt) — LLM-based knowledge distillation to enrich T2I prompts along emotional, spatial, and temporal dimensions.
  • Image Generation (act2i.generate) — Diffusers-based T2I generation across multiple seeds and prompt variants.
  • Feature Extraction (act2i.features) — DINOv2 and SigLIP feature extraction for reference image sets.
  • Evaluation (act2i.evaluate) — CLIPScore, DINOv2 similarity scoring, OWLv2 zero-shot object detection, and classification metrics.
  • Analysis (act2i.analysis) — Structural NLP analysis of prompt quality.

See scripts/ for CLI entrypoints.

Citation

@article{malaviya2025act2i,
  author    = {Malaviya, Vatsal and Chatterjee, Agneet and Patel, Maitreya and Yang, Yezhou and Baral, Chitta},
  title     = {AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models},
  journal   = {arXiv preprint arXiv:2509.16141},
  year      = {2025}
}

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages