Skip to content

haoxian-chen/MallowsPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

26 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

MallowsPO

paper (arXiv:2405.14953) OpenReview (ICLR 2025) NeurIPS 2024 Pluralistic Alignment

When training LLMs to learn from human preferences, we notice the natural dispersion of human preferences. For example, some prompts/questions' preferences maybe more subjective opposed to others, resulting to preferences which may be biased and of high dispersion.

Dispersion of Human Preferences

To address this issue and to enhance the learning process from human feedback data, we are motivated explicitly model this part into perference modeling by using Mallows Ranking Model, as an alternative to the Bradley-Terry Model which is widely used in existing work yet fails to capture the different dispersion level of prompts. For the ease of computation, we also propose to utilize the inner uncertainty of LLMs as a proxy of the dispersion thus we do not want to train an additional model.

We include necessary components needed for implmenting MallowsPO(ICLR 2025) in the trainer script, modified from trl DPO trainer. MallowsPO can be implemented pretty easily by modifying from the common LLM RLHF libaries' implementations on DPO through adding several lines of drop-in code to calculate dispersion as provided in our trainer.

P.S. If you are interested in more detailed training and evaluation scripts, a more comprehensive codebase, which takes MallowsPO as a special instance, can be found at the follow-up work RainbowPO(ICLR 2025) which refers MallowsPO as contextual scaling.

๐Ÿ“œ Citation

If you find MallowsPO useful in your research, please consider citing our work! ๐Ÿš€

BibTeX

@article{chen2024mallows,
  title={Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions},
  author={Chen, Haoxian and Zhao, Hanyang and Lam, Henry and Yao, David and Tang, Wenpin},
  journal={arXiv preprint arXiv:2405.14953},
  year={2024}
}

About

[ICLR 2025] MallowsPO: Fine-Tune Your LLM with Preference Dispersions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages