GitHub - SHI-Labs/MapReduce-LoRA: MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

Problem: Multi-reward RLHF often suffers an alignment tax—improving one metric while degrading others.
Approach: We introduce two complementary methods:
- MapReduce LoRA: train reward-specific LoRA experts in parallel (Map) and iteratively merge them (Reduce) with configurable weights (default 1:1:1).
- Reward-aware Token Embedding (RaTE): learn reward-aware token embeddings that compose at inference for flexible preference control.
Results
- Text-to-Image:
  - SD3.5M: +36.1% (GenEval), +4.6% (PickScore), +55.7% (OCR)
  - FLUX.1-dev: +32.7% (GenEval), +4.3% (PickScore), +67.1% (OCR)
- Text-to-Video:
  - HunyuanVideo +48.1% (visual), +90.0% (motion)
- Language Task:
  - Llama-2 7B: Helpful Assistant: +43.4% (helpful), +136.7% (harmless)

🎨 Qualitative Performance

📊 Quantitative Performance

🚀 Quickstart:

0. Environment Setup

Clone this repo and install environments

# Pre-download the models to prevent repeatedly download the model from huggingface
huggingface-cli login
huggingface-cli download stabilityai/stable-diffusion-3.5-medium
huggingface-cli download black-forest-labs/FLUX.1-dev

# login wandb
wandb login

# install the conda environment
conda create -n mapreduce-lora python=3.12 -y
conda activate mapreduce-lora
pip install diffusers==0.33.1
pip install torch==2.6.0
pip install transformers==4.54.0
pip install protobuf==5.29.5
pip install sentencepiece==0.2.0
pip install accelerate==1.9.0
pip install --no-cache-dir -U packaging ninja==1.11.1.4
pip install flash-attn==2.8.0.post2 --no-build-isolation --no-cache-dir
pip install xformers==0.0.31.post1
pip install absl-py==2.3.1
pip install ml_collections==1.1.0
pip install wandb==0.18.7
pip install peft==0.10.0
# NOTE: for deepspeed
pip install deepspeed==0.17.2
# NOTE: for paddleocr
pip install paddlepaddle-gpu==2.6.2
pip install paddleocr==2.9.1
pip install python-Levenshtein==0.27.1

Pre-download the PaddelOCR model

from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=False, lang="en", use_gpu=False, show_log=False)

Prepare the geneval reward: follow reward-server to install the conda environment for geneval; pickscore and ocr are already included in mapreduce-lora conda env

1. MapReduce LoRA on Text-to-Image (SD35m and FLUX.1-dev)

The parallel automation trains three reward experts (GenEval / PickScore / OCR) in parallel and periodically merges their LoRAs with configurable weights (default 1:1:1). The merged adapter is then used to resume the next cycle.

Train MapReduce LoRA Automatically

The script automatically derives the parallel topology from your environment (e.g., scheduler-provided RANK/WORLD_SIZE or reachable nodes) and assigns nodes in groups of NODES_PER_TASK to the three experts (GenEval, PickScore, OCR).
By default it uses NODES_PER_TASK=4, GPUS_PER_NODE=8, MERGE_STEPS=100, and CYCLES=80. With 12 nodes available this yields a 12-node run (4 per expert). Distinct ports are used per expert: MASTER_PORT, MASTER_PORT+1, MASTER_PORT+2.

Minimal run (uses script defaults):

# simultaneously train 3 jobs, including GenEval, PickScore and OCR
# sd35m
bash scripts/init_scripts/init_parallel_sd35m.sh
# flux.1-dev 
bash scripts/init_scripts/init_parallel_flux.sh

# if there is limited GPU nodes, we can train 3 jobs sequentially: GenEval -> PickScore -> OCR -> GenEval -> ...
bash scripts/init_scripts/init_sequential.sh

Override common knobs as needed (example):

export CYCLES=5
export MERGE_STEPS=200
export WEIGHTS="1 1 1"          # merge weights: GenEval PickScore OCR
export NODES_PER_TASK=4
export GPUS_PER_NODE=8
export MASTER_ADDR=127.0.0.1     # base address; groups use derived ports
export MASTER_PORT=9998

# sd35m
bash scripts/init_scripts/init_parallel_sd35m.sh
# flux.1-dev
bash scripts/init_scripts/init_parallel_flux.sh

Adjustable parameters

CYCLES: total merge cycles.
MERGE_STEPS: steps per expert before each merge (default: 100).
WEIGHTS: merge weights for GenEval, PickScore, OCR (e.g., 1 1 1).
NODES_PER_TASK, GPUS_PER_NODE: nodes and GPUs per expert group.
MASTER_ADDR, MASTER_PORT: base rendezvous; experts use PORT, PORT+1, PORT+2.
WORLD_SIZE/RANK or NODE_IPS: scheduler-provided topology or static IPs.
LOG_DIR, OUT_ROOT: logs root and merged outputs root.
PRETRAINED_MODEL_PATH (script default), MODEL_PATH (optional local base model).
Auto Resume: Set AUTO_RESUME=1, SKIP_COMPLETED_TASKS=1, and Specify RESUME_RUN_TS.
Reward/Coordination: START_GENEVAL_REWARD, STOP_GENEVAL_REWARD_AFTER, MERGE_COORD_RANK.

SD 3.5 M training (the continuous one) and eval curves with fixed merging steps 100 for all rewards (k=80)

Train MapReduce LoRA Manually

Since different rewards may work better at different training steps, we can train each expert independently and merge their LoRAs manually.

# Train individual experts (Defaults to GenEval. For PickScore/OCR, update `init_manually_mapreduce_${model_name}.sh`)
model_name="sd35m" #flux

bash scripts/init_scripts/init_manually_mapreduce_${model_name}.sh
# Merge
python scripts/merge_scripts/merge_lora.py --model_name "${model_name} "--lora_paths "${GEN_LORA}" "${PICK_LORA}" "${OCR_LORA}" --weights ${WEIGHTS} --output_dir "${MERGE_OUT}"

SD 3.5 M eval curves with independent merging step for each reward

Inference with pre-trained weights

SD3.5M

a photo of a suitcase right of a boat


_GenEval	_PickScore	_OCR	_{MPR (Ours)}

FLUX.1-dev

Side-profile panning shot of an anthropomorphic tiger driving a high-speed Formula 1 car, sleek aerodynamic body, 'MapReduce LoRA' clearly printed on the side livery, major F1 circuit, tire smoke, strong motion blur in the background, sharp focus on the car, cinematic lighting, realistic motorsport photography.


_GenEval	_PickScore	_OCR	_{MPR (Ours)}

model_name="sd35" #flux
model_ckpt="SD3.5M" #FLUX.1-dev

# inference with individual experts GenEval 
python scripts/test_${model_name}.py --mode eval_single --use_adapter --lora_checkpoint "shi-labs/${model_ckpt}-ind-expert-GenEval" --results_dir "results/ind-geneval/"

# inference with individual experts PickScore 
python scripts/test_${model_name}.py --mode eval_single --use_adapter --lora_checkpoint "shi-labs/${model_ckpt}-ind-expert-PickScore" --results_dir "results/ind-pickscore/"

# inference with individual experts OCR 
python scripts/test_${model_name}.py --mode eval_single --use_adapter --lora_checkpoint "shi-labs/${model_ckpt}-ind-expert-OCR" --results_dir "results/ind-ocr/"

# inference with MapReduce-LoRA
python scripts/test_${model_name}.py --mode eval_single --use_adapter --lora_checkpoint "shi-labs/${model_ckpt}-MapReduce-LoRA-merge-k4" --results_dir "results/mpr/"

2. Reward-aware Token Embedding on Text-to-Image

# Defaults to GenEval. For PickScore/OCR, update `init_RaTE_sd35m.sh`
# Before start, update `config.teacher_lora_dir` in `config/sft_ti.py`.
bash scripts/init_scripts/init_RaTE_sd35m.sh

🙏 Acknowledgements

We gratefully acknowledge the generous contributions of the open-source community, especially the teams behind Stable Diffusion 3.5, FLUX.1-dev, HunyuanVideo, Llama, Flow-GRPO, DanceGRPO, GenEval, PickScore, PaddleOCR, VQAScore, MPS, VILA, and VideoAlign. Their publicly available code and models made this work possible.

📖 Citation

If you find this work useful, please cite:

@article{chen2025mapreducelora,
  title        = {MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models},
  author       = {Chieh-Yun Chen and Zhonghao Wang and Qi Chen and Zhifan Ye and Min Shi and Yue Zhao and Yinan Zhao and Hui Qu and Wei-An Lin and Yiru Shen and Ajinkya Kale and Irfan Essa and Humphrey Shi},
  year         = {2025},
  journal      = {arXiv preprint arXiv:2511.20629}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
config		config
dataset		dataset
flow_grpo		flow_grpo
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

🎨 Qualitative Performance

📊 Quantitative Performance

🚀 Quickstart:

0. Environment Setup

1. MapReduce LoRA on Text-to-Image (SD35m and FLUX.1-dev)

Train MapReduce LoRA Automatically

Train MapReduce LoRA Manually

Inference with pre-trained weights

2. Reward-aware Token Embedding on Text-to-Image

🙏 Acknowledgements

📖 Citation

About

Uh oh!

Releases

Packages

Languages

License

SHI-Labs/MapReduce-LoRA

Folders and files

Latest commit

History

Repository files navigation

MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

🎨 Qualitative Performance

📊 Quantitative Performance

🚀 Quickstart:

0. Environment Setup

1. MapReduce LoRA on Text-to-Image (SD35m and FLUX.1-dev)

Train MapReduce LoRA Automatically

Train MapReduce LoRA Manually

Inference with pre-trained weights

2. Reward-aware Token Embedding on Text-to-Image

🙏 Acknowledgements

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages