PSAlign is a novel framework enabling personalized safety alignment in text-to-image diffusion models. It dynamically adapts safety mechanisms to individual usersβ characteristics (e.g., age, gender, cultural background) while preserving creativity and image fidelity.
Key features:
- Personalization: Adjusts safety thresholds based on user profiles (e.g., stricter for minors, culturally aware for diverse groups).
- Fidelity Preservation: Maintains image quality and text alignment while suppressing harmful content.
- Compatibility: Works with Stable Diffusion 1.5 and SDXL via lightweight adapters (no full model retraining).
PSAlign/
βββ environment.yaml # Conda environment config
βββ train.py # PSA adapter training script
βββ infer.py # Inference script
βββ launchers/ # One-click scripts (training/inference for SD1.5/SDXL)
βββ psa_adapter/ # Core PSA adapter implementation
βββ evaluation/ # Evaluation tools
β βββ eval_gpt/ # GPT-based safety alignment evaluation
βββ dataset/ # Dataset handling (data loading)
βββ data/ # Data files (user embeddings, SAGE dataset, user info)
βββ trained_models/ # Pretrained models (PSA adapters for SD1.5/SDXL)
git clone https://github.com/M-E-AGI-Lab/PSAlign.git
cd PSAlignWe recommend using Conda for environment management:
# Create and activate environment
conda env create -f environment.yaml
conda activate psa
# Verify installation
python -c "import torch; print('PyTorch version:', torch.__version__)"SAGE (Safety-Aware Generation for Everyone) is the first dataset for personalized safety alignment in text-to-image generation, enabling models to adapt to individual user characteristics (age, culture, etc.).
Key features:
- 100K+ image-prompt pairs with "safe" vs "unsafe" variants.
- 10 safety categories (e.g., harassment, violence) with 800+ harmful concepts.
- User metadata (age, gender, religion, etc.) for personalization.
- Split into train/val/test_seen/test_unseen for robust evaluation.
For more detailed explanations, please refer to data/user_data/README.md and data/sage/README.md.
Please manually download the dataset from the following Google Drive link: π Download sage.zip
After downloading, move the file to data/ and unzip it:
mkdir -p data/
mv ~/Downloads/sage.zip data/ # Adjust path if needed
unzip data/sage.zip -d data/data/sage/
βββ [train/val/test_seen/test_unseen]/
β βββ metadata.jsonl # Annotations: prompts, labels, user profiles
β βββ [image files] # e.g., user_0000030_harassment_00001_s.jpg
bash launchers/train_psa_sd15.shTrained adapter saved to trained_models/psa-sd15/.
bash launchers/train_psa_sdxl.shTrained adapter saved to trained_models/psa-sdxl/.
Generate images with personalized safety alignment using pre-trained adapters.
# Base model (no safety alignment)
bash launchers/infer_sd15_base.sh
# With PSA adapter (personalized safety)
bash launchers/infer_sd15_psa.sh
# With PSA + LLM-generated user embeddings
bash launchers/infer_sd15_psa_llm.sh# Base model
bash launchers/infer_sdxl_base.sh
# With PSA adapter
bash launchers/infer_sdxl_psa.sh
# With PSA + LLM-generated user embeddings
bash launchers/infer_sdxl_psa_llm.shFollow these steps to reproduce the paperβs evaluation results. For more detailed explanations, please refer to evaluation/README.md and evaluation/eval_gpt/README.md.
First, generate images for all models (PSAlign + baselines) across benchmark datasets:
cd evaluation
# Generate for all datasets (recommended)
for dataset in debug coco_10k i2p_4073 CoProv2_test sage_unseen ud_1434; do
export DATASET=$dataset
bash scripts/run_gen.sh
doneImages saved to eval_images/<dataset>/<model>/ (e.g., eval_images/coco_10k/psa/sd15/level_3).
Evaluate image fidelity, text alignment, and harmful content suppression:
# Run with GPUs 0,1,2,3 (adjust based on available GPUs)
python scripts/run_eval.py --gpus 0,1,2,3 --output eval_results.csvFID: Measures realism (lower = better).CLIPScore: Measures text-image alignment (higher = better).InPro: Measures inappropriate content (lower = better).
Assess personalized safety via pass rate (compliance with user requirements) and win rate (comparison to baselines):
cd eval_gpt
# Evaluate pass rate for PSAlign vs. baselines
bash run_eval_gpt.sh --mode evaluate --dataset all --models base safetydpo psa
# Compare PSAlign vs. SafetyDPO (win rate)
bash run_eval_gpt.sh --mode compare --dataset all --model-a safetydpo --model-b psaResults saved to results_evaluate/ or results_compare/ (includes GPT judgments and summary stats).
- Stable Diffusion for base model.
- Q16 for safety classification.
- DiffusionDPO for RLHF framework.
- SafetyDPO for baseline safety tuning.
- Safe Latent Diffusion for SLD baseline.
- Erasing for ESD-U baseline.
- Unified Concept Editing for UCE baseline.
If you use this work, please cite:
@article{lei2025personalized,
title={Personalized Safety Alignment for Text-to-Image Diffusion Models},
author={Lei, Yu and Bai, Jinbin and Shi, Qingyu and Feng, Aosong and Yu, Kaidong},
journal={arXiv preprint arXiv:2508.01151},
year={2025}
}