SRPA: Self-Reflective Preference Adaptation

Abstract

With the rapid advancement of Large Language Models (LLMs) and their wide-ranging applications, there’s a growing interest in improving the efficiency and quality of interactions between humans and LLMs. Researchers are exploring ways to leverage LLMs' capabilities to better understand and adapt to user preferences, ultimately enhancing personalization and user experience during interactions. Traditional approaches, such as reinforcement learning, require fine-tuning each model to learn individual user preferences. However, achieving personalized experiences for each user through model fine-tuning is highly resource-intensive and requires collecting a preference dataset for each user. This raises the question: Can LLMs effectively learn a user's implicit preferences by reflecting on past conversations without training the model? In this paper, we propose Self-Reflective Preference Adaptation (SRPA), a training-free personalization framework that builds an external preference database from each user’s conversation history, guided by the LLM's self-reflection ability. This framework is lightweight, easy to integrate into any LLM-based chatbot, and designed to adapt to user preferences within just a few conversations. SRPA significantly improves conversational efficiency, reducing the average number of conversation turns required to achieve satisfactory outputs by up to 20% across user groups and consistently outperforming DPO or larger baseline models in alignment with user preferences.

Setup

To initialize your environment and install necessary packages, use the following commands:

conda create -n <your_env_name> python=3.10
conda activate <your_env_name>

Set up your OpenAI API Key in your environment before running any OpenAI related models.

Evaluation

To run the evaluation on SRPA using role-players, generate questions for your roleplayers and create teh dataset. The example dataset is in synthetic_task_dataset.jsonl.

python runner.py 
    --data_path {your_dataset_path.jsonl
    --extract_threshold {preference_extraction_threshold}
    --update_threshold {preference_update_threshold}
    --no_preference {True_for_evaluating_without_SRPA}

Interface

To use the interface for SRPA, run:

python app.py

If you want to share the interface to others, in last line:

demo.launch(share=True) # False for only run locally

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
__pycache__		__pycache__
.gitignore		.gitignore
ChatBot.py		ChatBot.py
Conversation.py		Conversation.py
Embedding.py		Embedding.py
README.md		README.md
Reflector.py		Reflector.py
app.py		app.py
evaluate_chat_histories.py		evaluate_chat_histories.py
helpers.py		helpers.py
prompt.py		prompt.py
runner.py		runner.py
synthetic_task_dataset.jsonl		synthetic_task_dataset.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SRPA: Self-Reflective Preference Adaptation

Abstract

Setup

Evaluation

Interface

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SRPA: Self-Reflective Preference Adaptation

Abstract

Setup

Evaluation

Interface

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages