$ /COMEDY/Data# python data_formatting.py
$ python training/step1_supervised_finetuning/main_peft.py
This is the Re-Proudction project of paper: Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations
This repository contains resources for accessing the official benchmarks, codes, and checkpoints of the paper: "Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations".
This work pioneers exploring and building powerful Long-Term Conversation Dialogue Systems without retrieval. To accomplish this, we make the following works:
- COMEDY, LLM-based COmpressive Memory-Enhanced Dialogue sYstems framework.
- Dolphin, the biggest Chinese long-term conversation dataset from actual online user-chatbot interactions. This dataset contains three tasks: Session-Level Memory Summarization; Memory Compression; Memory-Grounded Response Generation, comprising an extensive collection of 100k samples.
COMEDY adopts a groundbreaking ''One-for-All'' approach, utilizing a single, unified model to manage the entire process from memory generation, compression to final response generation for long-term memory dialogue generation.
-
COMEDY firstly involves distilling session-specific memory from past dialogues, encompassing fine-grained session summaries, including event recaps, and detailed user and bot portraits;
-
In a break from traditional systems, COMEDY eschews the use of a memory database for storing these insights. Instead, it reprocesses and condenses memories from all past interactions, forming a Compressive Memory. The first part is the concise events that have occurred throughout all the conversations, creating a historical narrative that the system can draw upon. The second and third parts consist of a detailed user profile and the dynamic relationship changes between the user and chatbot across sessions, both derived from past conversational events.
-
Finally, COMEDY skillfully integrates this compressive memory into ongoing conversations, enabling contextually memory-enhanced interactions.
Our collected Dpolphin contain 3 tasks:
- Task1: Session-Level Memory Summarization
- Task2: Memory Compression
- Task3: Memory-Grounded Response Generation
🤗 [Dolphin-Test Dataset]
This work introduces a novel framework, COmpressive Memory-Enhanced Dialogue sYstems (COMEDY), which eschews traditional retrieval modules and memory databases. Instead, COMEDY adopts a "One-for-All" approach, utilizing a single language model to manage memory generation, compression, and response generation.
Clone this repository and install the required packages:
git clone https://github.com/wjcldply/COMEDY.git
cd COMEDY
pip install -r requirements.txtOur training strategies include two stage: Mixed-task training and DPO Alignment
cd Data
python build_dataset.py- Prerequisite
chmod +x ./training/step1_supervised_finetuning/*.py- Pre-Training
bash Scripts/MultiTask-Training-7B-PreTrain-FP16.sh \
meta-llama/Llama-2-7b-hf \
./Output \
./Logs \
./Data/MultiTask_Training_Data/Dolphin_MultiTask_Shuffled_train.json \
./Data/MultiTask_Training_Data/Dolphin_MultiTask_Shuffled_validation.jsonwhich consists of the following commands:
#!/bin/bash
# DeepSpeed Team
CURRENT_TIME=$(TZ=UTC-8 date +"%Y-%m-%d-%H.%M.%S")
# ZERO_STAGE="--zero_stage 2"
ZERO_STAGE="--zero_stage 3" # configures the DeepSpeed zero optimization stage for memory efficiency during training
MODEL_PATH=$1 # path to the model to be fine-tuned
OUTPUT=$2 # base directory where output data will be saved
LOG_PATH=$3 # directory where logs will be saved
TRN_FN=$4 # training data file path
DEV_FN=$5 # development/validation data file path
echo "MODEL_PATH: '${MODEL_PATH}'"
echo "OUTPUT: '${OUTPUT}'"
echo "LOG_PATH: '${LOG_PATH}'"
echo "TRN_FN: '${TRN_FN}'"
echo "DEV_FN: '${DEV_FN}'"
export TOKENIZERS_PARALLELISM=False
NUM_GPUS=$(nvidia-smi -L | wc -l)
GPU_LIST=$(seq -s, 0 $((NUM_GPUS-1)))
TOTAL_SIZE=`wc -l ${TRN_FN}` # number of lines (samples) in the training file
echo "number of samples in trainset: ${TOTAL_SIZE}"
mkdir -p $OUTPUT/$CURRENT_TIME
deepspeed --num_gpus=${NUM_GPUS} training/step1_supervised_finetuning/main.py \
--model_name_or_path ${MODEL_PATH} \
--train_data_path ${TRN_FN} \
--valid_data_path ${DEV_FN} \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--data_output_path $OUTPUT/data \
--max_seq_len 2048 \
--fp16 \
--learning_rate 1e-5 \
--weight_decay 0.1 \
--num_train_epochs 3 \
--num_train_samples ${TOTAL_SIZE} \
--gradient_accumulation_steps 1 \
--lr_scheduler_type cosine \
--num_warmup_steps 400 \
--seed 42 \
${ZERO_STAGE} \
--save_interval 2000 \
--log_interval 100 \
--eval_interval 1000 \
--output_dir $OUTPUT/$CURRENT_TIME \
--gradient_checkpointing \
--tensorboard_path $LOG_PATH \
&>$OUTPUT/train_$CURRENT_TIME.log&- LoRA-FineTuning
bash Scripts/MultiTask-Training-7B-LoRA-FP16.sh \
meta-llama/Llama-2-7b-hf \
./Output \
./Logs \
./Data/MultiTask_Training_Data/Dolphin_MultiTask_Shuffled_train.json \
./Data/MultiTask_Training_Data/Dolphin_MultiTask_Shuffled_validation.jsonwhich consists of the following commands:
#!/bin/bash
# DeepSpeed Team
CURRENT_TIME=$(TZ=UTC-8 date +"%Y-%m-%d-%H.%M.%S")
# ZERO_STAGE="--zero_stage 2"
ZERO_STAGE="--zero_stage 3" # configures the DeepSpeed zero optimization stage for memory efficiency during training
MODEL_PATH=$1 # path to the model to be fine-tuned
OUTPUT=$2 # base directory where output data will be saved
LOG_PATH=$3 # directory where logs will be saved
TRN_FN=$4 # training data file path
DEV_FN=$5 # development/validation data file path
echo "MODEL_PATH: '${MODEL_PATH}'"
echo "OUTPUT: '${OUTPUT}'"
echo "LOG_PATH: '${LOG_PATH}'"
echo "TRN_FN: '${TRN_FN}'"
echo "DEV_FN: '${DEV_FN}'"
export TOKENIZERS_PARALLELISM=False
NUM_GPUS=$(nvidia-smi -L | wc -l)
GPU_LIST=$(seq -s, 0 $((NUM_GPUS-1)))
TOTAL_SIZE=`wc -l ${TRN_FN}` # number of lines (samples) in the training file
echo "number of samples in trainset: ${TOTAL_SIZE}"
mkdir -p $OUTPUT/$CURRENT_TIME
deepspeed --num_gpus=${NUM_GPUS} training/step1_supervised_finetuning/main.py \
--model_name_or_path ${MODEL_PATH} \
--train_data_path ${TRN_FN} \
--valid_data_path ${DEV_FN} \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--lora_dim 128 \
--lora_module_name model.layers. \
--only_optimize_lora \
--data_output_path $OUTPUT/data \
--fp16 \
--max_seq_len 2048 \
--learning_rate 1e-5 \
--weight_decay 0.1 \
--num_train_epochs 3 \
--num_train_samples ${TOTAL_SIZE} \
--gradient_accumulation_steps 8 \
--lr_scheduler_type cosine \
--num_warmup_steps 400 \
--seed 42 \
${ZERO_STAGE} \
--save_interval 2000 \
--log_interval 100 \
--eval_interval 1000 \
--output_dir $OUTPUT/$CURRENT_TIME \
--tensorboard_path $LOG_PATH \
&>$OUTPUT/train_$CURRENT_TIME.log&chmod +x ./training/step2_dpo_training/*.pycd training/step2_dpo_training
bash training_scripts/single_node/run_memory.sh#!/bin/bash
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0
# local/xjsonfile/rftV2
# DeepSpeed Team
OUTPUT=$1
ZERO_STAGE=$2
DATA_PATH=$3
SFT_CKPT=$4
if [ "$OUTPUT" == "" ]; then
OUTPUT=output/compress_memory/13b_v2_dpo_0.01_sft/
fi
if [ "$ZERO_STAGE" == "" ]; then
ZERO_STAGE=3
fi
mkdir -p $OUTPUT
deepspeed --include localhost:0,1,2,3,4,5,6,7 --master_port=29592 main.py \
--data_path $DATA_PATH \
--data_split 0,10,0 \
--model_name_or_path $SFT_CKPT \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 2 \
--max_seq_len 2048 \
--learning_rate 1e-5 \
--weight_decay 0. \
--num_train_epochs 2 \
--beta 0.01 \
--gradient_accumulation_steps 1 \
--lr_scheduler_type cosine \
--num_warmup_steps 10 \
--seed 1234 \
--zero_stage $ZERO_STAGE \
--deepspeed \
--add_sft \
--print_loss \
--gradient_checkpointing \
--output_dir $OUTPUT \
--tensorboard_path $OUTPUT/runs \
&> $OUTPUT/training.log
To replicate the experimental results in our paper, run:
python comedy_test.pyWe recruit human annotators to evaluate the model performances in terms of Scoring and Ranking.
Please cite our paper if you use our data, model or code. Please also kindly cite the original dataset papers.
@misc{chen2024compress,
title={Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations},
author={Nuo Chen and Hongguang Li and Juhua Huang and Baoyuan Wang and Jia Li},
year={2024},
eprint={2402.11975},
archivePrefix={arXiv},
primaryClass={cs.CL}
}




