PSAlign: Personalized Safety Alignment for Text-to-Image Diffusion Models

🧠 Overview

PSAlign is a novel framework enabling personalized safety alignment in text-to-image diffusion models. It dynamically adapts safety mechanisms to individual users’ characteristics (e.g., age, gender, cultural background) while preserving creativity and image fidelity.

Key features:

Personalization: Adjusts safety thresholds based on user profiles (e.g., stricter for minors, culturally aware for diverse groups).
Fidelity Preservation: Maintains image quality and text alignment while suppressing harmful content.
Compatibility: Works with Stable Diffusion 1.5 and SDXL via lightweight adapters (no full model retraining).

📂 Project Structure

PSAlign/
├── environment.yaml       # Conda environment config
├── train.py               # PSA adapter training script
├── infer.py               # Inference script
├── launchers/             # One-click scripts (training/inference for SD1.5/SDXL)
├── psa_adapter/           # Core PSA adapter implementation
├── evaluation/            # Evaluation tools
│   └── eval_gpt/          # GPT-based safety alignment evaluation
├── dataset/               # Dataset handling (data loading)
├── data/                  # Data files (user embeddings, SAGE dataset, user info)
└── trained_models/        # Pretrained models (PSA adapters for SD1.5/SDXL)

📦 Installation

1. Clone the Repository

git clone https://github.com/M-E-AGI-Lab/PSAlign.git
cd PSAlign

2. Setup Environment

We recommend using Conda for environment management:

# Create and activate environment
conda env create -f environment.yaml
conda activate psa

# Verify installation
python -c "import torch; print('PyTorch version:', torch.__version__)"

📚 SAGE Dataset

SAGE (Safety-Aware Generation for Everyone) is the first dataset for personalized safety alignment in text-to-image generation, enabling models to adapt to individual user characteristics (age, culture, etc.).

Key features:

100K+ image-prompt pairs with "safe" vs "unsafe" variants.
10 safety categories (e.g., harassment, violence) with 800+ harmful concepts.
User metadata (age, gender, religion, etc.) for personalization.
Split into train/val/test_seen/test_unseen for robust evaluation.

For more detailed explanations, please refer to data/user_data/README.md and data/sage/README.md.

Download

Please manually download the dataset from the following Google Drive link: 👉 Download sage.zip

After downloading, move the file to data/ and unzip it:

mkdir -p data/
mv ~/Downloads/sage.zip data/         # Adjust path if needed
unzip data/sage.zip -d data/

Structure

data/sage/
├── [train/val/test_seen/test_unseen]/
│   ├── metadata.jsonl  # Annotations: prompts, labels, user profiles
│   └── [image files]   # e.g., user_0000030_harassment_00001_s.jpg

🚀 Usage

🔧 Training PSA Adapters

For Stable Diffusion 1.5

bash launchers/train_psa_sd15.sh

Trained adapter saved to trained_models/psa-sd15/.

For SDXL

bash launchers/train_psa_sdxl.sh

Trained adapter saved to trained_models/psa-sdxl/.

🎨 Inference

Generate images with personalized safety alignment using pre-trained adapters.

Stable Diffusion 1.5

# Base model (no safety alignment)
bash launchers/infer_sd15_base.sh

# With PSA adapter (personalized safety)
bash launchers/infer_sd15_psa.sh

# With PSA + LLM-generated user embeddings
bash launchers/infer_sd15_psa_llm.sh

SDXL

# Base model
bash launchers/infer_sdxl_base.sh

# With PSA adapter
bash launchers/infer_sdxl_psa.sh

# With PSA + LLM-generated user embeddings
bash launchers/infer_sdxl_psa_llm.sh

📊 Evaluation

Follow these steps to reproduce the paper’s evaluation results. For more detailed explanations, please refer to evaluation/README.md and evaluation/eval_gpt/README.md.

1. Generate Evaluation Images

First, generate images for all models (PSAlign + baselines) across benchmark datasets:

cd evaluation
# Generate for all datasets (recommended)
for dataset in debug coco_10k i2p_4073 CoProv2_test sage_unseen ud_1434; do
  export DATASET=$dataset
  bash scripts/run_gen.sh
done

Images saved to eval_images/<dataset>/<model>/ (e.g., eval_images/coco_10k/psa/sd15/level_3).

2. Quantitative Metrics (FID, CLIPScore, InPro)

Evaluate image fidelity, text alignment, and harmful content suppression:

# Run with GPUs 0,1,2,3 (adjust based on available GPUs)
python scripts/run_eval.py --gpus 0,1,2,3 --output eval_results.csv

FID: Measures realism (lower = better).
CLIPScore: Measures text-image alignment (higher = better).
InPro: Measures inappropriate content (lower = better).

3. Personalized Safety Alignment (GPT-based)

Assess personalized safety via pass rate (compliance with user requirements) and win rate (comparison to baselines):

cd eval_gpt

# Evaluate pass rate for PSAlign vs. baselines
bash run_eval_gpt.sh --mode evaluate --dataset all --models base safetydpo psa

# Compare PSAlign vs. SafetyDPO (win rate)
bash run_eval_gpt.sh --mode compare --dataset all --model-a safetydpo --model-b psa

Results saved to results_evaluate/ or results_compare/ (includes GPT judgments and summary stats).

🤝 Acknowledgements

Stable Diffusion for base model.
Q16 for safety classification.
DiffusionDPO for RLHF framework.
SafetyDPO for baseline safety tuning.
Safe Latent Diffusion for SLD baseline.
Erasing for ESD-U baseline.
Unified Concept Editing for UCE baseline.

📖 Citation

If you use this work, please cite:

@article{lei2025personalized,
  title={Personalized Safety Alignment for Text-to-Image Diffusion Models},
  author={Lei, Yu and Bai, Jinbin and Shi, Qingyu and Feng, Aosong and Yu, Kaidong},
  journal={arXiv preprint arXiv:2508.01151},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PSAlign: Personalized Safety Alignment for Text-to-Image Diffusion Models

🧠 Overview

📂 Project Structure

📦 Installation

1. Clone the Repository

2. Setup Environment

📚 SAGE Dataset

Download

Structure

🚀 Usage

🔧 Training PSA Adapters

For Stable Diffusion 1.5

For SDXL

🎨 Inference

Stable Diffusion 1.5

SDXL

📊 Evaluation

1. Generate Evaluation Images

2. Quantitative Metrics (FID, CLIPScore, InPro)

3. Personalized Safety Alignment (GPT-based)

🤝 Acknowledgements

📖 Citation

About

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
assets		assets
data		data
dataset		dataset
evaluation		evaluation
launchers		launchers
psa_adapter		psa_adapter
trained_models		trained_models
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
infer.py		infer.py
train.py		train.py

License

M-E-AGI-Lab/PSAlign

Folders and files

Latest commit

History

Repository files navigation

PSAlign: Personalized Safety Alignment for Text-to-Image Diffusion Models

🧠 Overview

📂 Project Structure

📦 Installation

1. Clone the Repository

2. Setup Environment

📚 SAGE Dataset

Download

Structure

🚀 Usage

🔧 Training PSA Adapters

For Stable Diffusion 1.5

For SDXL

🎨 Inference

Stable Diffusion 1.5

SDXL

📊 Evaluation

1. Generate Evaluation Images

2. Quantitative Metrics (FID, CLIPScore, InPro)

3. Personalized Safety Alignment (GPT-based)

🤝 Acknowledgements

📖 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages