Official code release for "AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models".
pip install -e . # core dependencies
pip install -e ".[analysis]" # + spacy, matplotlib, seaborn, plotly
pip install -e ".[all]" # everything including dev toolsThe act2i package provides:
- Prompt Enhancement (
act2i.prompt) — LLM-based knowledge distillation to enrich T2I prompts along emotional, spatial, and temporal dimensions. - Image Generation (
act2i.generate) — Diffusers-based T2I generation across multiple seeds and prompt variants. - Feature Extraction (
act2i.features) — DINOv2 and SigLIP feature extraction for reference image sets. - Evaluation (
act2i.evaluate) — CLIPScore, DINOv2 similarity scoring, OWLv2 zero-shot object detection, and classification metrics. - Analysis (
act2i.analysis) — Structural NLP analysis of prompt quality.
See scripts/ for CLI entrypoints.
@article{malaviya2025act2i,
author = {Malaviya, Vatsal and Chatterjee, Agneet and Patel, Maitreya and Yang, Yezhou and Baral, Chitta},
title = {AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models},
journal = {arXiv preprint arXiv:2509.16141},
year = {2025}
}This project is licensed under the MIT License. See the LICENSE file for details.