Semantic-Guided Image Augmentation Pipeline

📄 Paper Repository: This is the official implementation of our paper "Semantic-Guided Autoregressive Diffusion-Based Data Augmentation Using Visual Instructions" (ISAS 2025) by Ege Yavuzcan, Ömer Kuş, and Abdurrahman Gümüş.

A modular, iterative framework that addresses the challenge of limited training data in deep learning by generating semantically consistent augmented images. Unlike traditional augmentation techniques (flipping, rotation, color jittering) that produce visually similar samples, our approach leverages Vision Language Models and diffusion-based image generation to create diverse yet contextually meaningful variations.

Eye Images

Key Contributions

VLM-Driven Captioning: Utilizes LLaVA to generate multiple semantic descriptions of input images
CLIP-Based Prompt Selection: Automatically selects the most relevant caption using cosine similarity scoring
Autoregressive Refinement: Each augmented image becomes the input for subsequent iterations, enabling progressive semantic diversity
Classifier Improvement: Augmented datasets significantly improve classification model performance on limited data

Features

Iterative Refinement: Repeat caption → selection → augmentation for configurable iterations
Batch Processing: CLI supports single image or folder input, outputs to structured subfolders
Modular Architecture: Separate components for captioning (LLaVA), prompt selection (CLIP), and augmentation (Stable Diffusion)
Detailed Logging: Records generated captions, cleaned prompts, similarity scores, and augmentation states
Configurable: All hyperparameters and model checkpoints defined in config/config.yaml

Installation

# Clone and install in editable mode
git clone https://github.com/egeyavuzcan/semantic-data-augmentation
cd semantic-data-augmentation
pip install -e .
pip install atma

# (Optional) Create and activate Conda environment
conda create -n myenv python=3.10
conda activate myenv
pip install -r requirements.txt

CLI Usage

sda-augment -i <input_path> -o <output_dir> -c config/config.yaml [--return_all]

-i, --input_path: Path to image file or directory of images.
-o, --output_dir: Directory for augmented outputs.
-c, --config: Pipeline configuration YAML.
--return_all: Save all intermediate results instead of final only.

Algorithm

Load Configuration: Read models, thresholds, iteration count, and logging settings.
Initialize Modules:
- CaptionGenerator: Uses LLaVA chat model to produce exactly three numbered captions.
- PromptSelector: Encodes captions and image via CLIP, computes cosine similarities, and picks top prompt (strips any ASSISTANT: prefix).
- Augmenter: Wraps Stable Diffusion Img2Img for conditional image augmentation.
- Refiner: Orchestrates iterative pipeline.
Iterative Loop (per image):
- Generate captions.
- Select best prompt by CLIP score.
- Augment image with chosen prompt.
- Repeat for iterations times.
Output: Save results under <output_dir>/augmented_dataset with iteration suffixes.

Configuration (`config/config.yaml`)

caption_generation:
  model_name_or_path: llava-hf/llava-1.5-7b-hf
  prompt_templates:
    general: "ER:  \nGive me three detailed image captions describing the main elements, context, and any notable objects or actions in this image."
  max_new_tokens: 200

prompt_selection:
  clip_model: openai/clip-vit-base-patch32
  threshold: 0.2

augmentation:
  model: sd-legacy/stable-diffusion-v1-5
  guidance_scale: 7.5
  num_inference_steps: 35

iterative_refinement:
  iterations: 3

logging:
  level: INFO
  log_dir: logs

Citation

@inproceedings{yavuzcan2025semantic,
  title={Semantic-Guided Autoregressive Diffusion-Based Data Augmentation Using Visual Instructions},
  author={Yavuzcan, Ege and Kuş, Ömer and Gümüş, Abdurrahman},
  booktitle={ISAS 2025},
  year={2025}
}

Contributing

Contributions welcome! Please open issues and pull requests.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
docs		docs
input		input
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semantic-Guided Image Augmentation Pipeline

Key Contributions

Features

Installation

CLI Usage

Algorithm

Configuration (`config/config.yaml`)

Citation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

egeyavuzcan/semantic-data-augmentation

Folders and files

Latest commit

History

Repository files navigation

Semantic-Guided Image Augmentation Pipeline

Key Contributions

Features

Installation

CLI Usage

Algorithm

Configuration (config/config.yaml)

Citation

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Configuration (`config/config.yaml`)

Packages