Skip to content

Official Implementation of "PSAlign: Personalized Safety Alignment for Text-to-Image Diffusion Models"

License

Notifications You must be signed in to change notification settings

M-E-AGI-Lab/PSAlign

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

38 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PSAlign: Personalized Safety Alignment for Text-to-Image Diffusion Models

Project Arxiv SAGE Dataset Pretrained Models

image

🧠 Overview

PSAlign is a novel framework enabling personalized safety alignment in text-to-image diffusion models. It dynamically adapts safety mechanisms to individual users’ characteristics (e.g., age, gender, cultural background) while preserving creativity and image fidelity.

Key features:

  • Personalization: Adjusts safety thresholds based on user profiles (e.g., stricter for minors, culturally aware for diverse groups).
  • Fidelity Preservation: Maintains image quality and text alignment while suppressing harmful content.
  • Compatibility: Works with Stable Diffusion 1.5 and SDXL via lightweight adapters (no full model retraining).

πŸ“‚ Project Structure

PSAlign/
β”œβ”€β”€ environment.yaml       # Conda environment config
β”œβ”€β”€ train.py               # PSA adapter training script
β”œβ”€β”€ infer.py               # Inference script
β”œβ”€β”€ launchers/             # One-click scripts (training/inference for SD1.5/SDXL)
β”œβ”€β”€ psa_adapter/           # Core PSA adapter implementation
β”œβ”€β”€ evaluation/            # Evaluation tools
β”‚   └── eval_gpt/          # GPT-based safety alignment evaluation
β”œβ”€β”€ dataset/               # Dataset handling (data loading)
β”œβ”€β”€ data/                  # Data files (user embeddings, SAGE dataset, user info)
└── trained_models/        # Pretrained models (PSA adapters for SD1.5/SDXL)

πŸ“¦ Installation

1. Clone the Repository

git clone https://github.com/M-E-AGI-Lab/PSAlign.git
cd PSAlign

2. Setup Environment

We recommend using Conda for environment management:

# Create and activate environment
conda env create -f environment.yaml
conda activate psa

# Verify installation
python -c "import torch; print('PyTorch version:', torch.__version__)"

πŸ“š SAGE Dataset

SAGE (Safety-Aware Generation for Everyone) is the first dataset for personalized safety alignment in text-to-image generation, enabling models to adapt to individual user characteristics (age, culture, etc.).

Key features:

  • 100K+ image-prompt pairs with "safe" vs "unsafe" variants.
  • 10 safety categories (e.g., harassment, violence) with 800+ harmful concepts.
  • User metadata (age, gender, religion, etc.) for personalization.
  • Split into train/val/test_seen/test_unseen for robust evaluation.

For more detailed explanations, please refer to data/user_data/README.md and data/sage/README.md.

Download

Please manually download the dataset from the following Google Drive link: πŸ‘‰ Download sage.zip

After downloading, move the file to data/ and unzip it:

mkdir -p data/
mv ~/Downloads/sage.zip data/         # Adjust path if needed
unzip data/sage.zip -d data/

Structure

data/sage/
β”œβ”€β”€ [train/val/test_seen/test_unseen]/
β”‚   β”œβ”€β”€ metadata.jsonl  # Annotations: prompts, labels, user profiles
β”‚   └── [image files]   # e.g., user_0000030_harassment_00001_s.jpg

πŸš€ Usage

πŸ”§ Training PSA Adapters

For Stable Diffusion 1.5

bash launchers/train_psa_sd15.sh

Trained adapter saved to trained_models/psa-sd15/.

For SDXL

bash launchers/train_psa_sdxl.sh

Trained adapter saved to trained_models/psa-sdxl/.

🎨 Inference

Generate images with personalized safety alignment using pre-trained adapters.

Stable Diffusion 1.5

# Base model (no safety alignment)
bash launchers/infer_sd15_base.sh

# With PSA adapter (personalized safety)
bash launchers/infer_sd15_psa.sh

# With PSA + LLM-generated user embeddings
bash launchers/infer_sd15_psa_llm.sh

SDXL

# Base model
bash launchers/infer_sdxl_base.sh

# With PSA adapter
bash launchers/infer_sdxl_psa.sh

# With PSA + LLM-generated user embeddings
bash launchers/infer_sdxl_psa_llm.sh

πŸ“Š Evaluation

Follow these steps to reproduce the paper’s evaluation results. For more detailed explanations, please refer to evaluation/README.md and evaluation/eval_gpt/README.md.

1. Generate Evaluation Images

First, generate images for all models (PSAlign + baselines) across benchmark datasets:

cd evaluation
# Generate for all datasets (recommended)
for dataset in debug coco_10k i2p_4073 CoProv2_test sage_unseen ud_1434; do
  export DATASET=$dataset
  bash scripts/run_gen.sh
done

Images saved to eval_images/<dataset>/<model>/ (e.g., eval_images/coco_10k/psa/sd15/level_3).

2. Quantitative Metrics (FID, CLIPScore, InPro)

Evaluate image fidelity, text alignment, and harmful content suppression:

# Run with GPUs 0,1,2,3 (adjust based on available GPUs)
python scripts/run_eval.py --gpus 0,1,2,3 --output eval_results.csv
  • FID: Measures realism (lower = better).
  • CLIPScore: Measures text-image alignment (higher = better).
  • InPro: Measures inappropriate content (lower = better).

3. Personalized Safety Alignment (GPT-based)

Assess personalized safety via pass rate (compliance with user requirements) and win rate (comparison to baselines):

cd eval_gpt

# Evaluate pass rate for PSAlign vs. baselines
bash run_eval_gpt.sh --mode evaluate --dataset all --models base safetydpo psa

# Compare PSAlign vs. SafetyDPO (win rate)
bash run_eval_gpt.sh --mode compare --dataset all --model-a safetydpo --model-b psa

Results saved to results_evaluate/ or results_compare/ (includes GPT judgments and summary stats).

🀝 Acknowledgements

πŸ“– Citation

If you use this work, please cite:

@article{lei2025personalized,
  title={Personalized Safety Alignment for Text-to-Image Diffusion Models},
  author={Lei, Yu and Bai, Jinbin and Shi, Qingyu and Feng, Aosong and Yu, Kaidong},
  journal={arXiv preprint arXiv:2508.01151},
  year={2025}
}

About

Official Implementation of "PSAlign: Personalized Safety Alignment for Text-to-Image Diffusion Models"

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •