Skip to content

Latest commit

 

History

History
49 lines (38 loc) · 1.99 KB

File metadata and controls

49 lines (38 loc) · 1.99 KB

AcT2I

Project Page arXiv: 2509.16141 Hugging Face: Datasets

Official code release for "AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models".

Installation

pip install -e .            # core dependencies
pip install -e ".[analysis]" # + spacy, matplotlib, seaborn, plotly
pip install -e ".[all]"      # everything including dev tools

Overview

The act2i package provides:

  • Prompt Enhancement (act2i.prompt) — LLM-based knowledge distillation to enrich T2I prompts along emotional, spatial, and temporal dimensions.
  • Image Generation (act2i.generate) — Diffusers-based T2I generation across multiple seeds and prompt variants.
  • Feature Extraction (act2i.features) — DINOv2 and SigLIP feature extraction for reference image sets.
  • Evaluation (act2i.evaluate) — CLIPScore, DINOv2 similarity scoring, OWLv2 zero-shot object detection, and classification metrics.
  • Analysis (act2i.analysis) — Structural NLP analysis of prompt quality.

See scripts/ for CLI entrypoints.

Citation

@article{malaviya2025act2i,
  author    = {Malaviya, Vatsal and Chatterjee, Agneet and Patel, Maitreya and Yang, Yezhou and Baral, Chitta},
  title     = {AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models},
  journal   = {arXiv preprint arXiv:2509.16141},
  year      = {2025}
}

License

This project is licensed under the MIT License. See the LICENSE file for details.