Dear Author,
I am trying to reproduce the rec performance on INSPIRED dataset.

I use the hyperparameters you recommend and the "best" model as prompt-encoder. Unfortunately, I was not able to reproduce the performance on the paper.
---- Here I attached the loss and recall@1 on testset for prompt pre-training, conversational training, and recommendation training steps:


prompt pre-training

conversational training


recommendation training (as you can see, the best recall@1 I got is around 0.04, far from 0.09)
---- and here are the configuration I use for prompt pre-training, conversational training, and recommendation training steps:
python3 train_pre.py \
--dataset inspired \
--tokenizer microsoft/DialoGPT-small \
--model microsoft/DialoGPT-small \
--text_tokenizer roberta-base \
--text_encoder roberta-base \
--num_train_epochs 5 \
--gradient_accumulation_steps 1 \
--per_device_train_batch_size 64 \
--per_device_eval_batch_size 128 \
--num_warmup_steps 168 \
--max_length 200 \
--prompt_max_length 200 \
--entity_max_length 32 \
--learning_rate 6e-4 \
--output_dir UniCRS/src/result_promptpretraining_inspired \
--use_wandb \
--project crs-prompt-pre-inspired \
--name exp1 \
--gpu 0
prompt pre-training
python3 train_conv.py \
--dataset inspired \
--tokenizer microsoft/DialoGPT-small \
--model microsoft/DialoGPT-small \
--text_tokenizer roberta-base \
--text_encoder roberta-base \
--n_prefix_conv 20 \
--prompt_encoder UniCRS/src/result_promptpretraining_inspired/best/ \
--num_train_epochs 10 \
--gradient_accumulation_steps 1 \
--ignore_pad_token_for_loss \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 16 \
--num_warmup_steps 976 \
--context_max_length 200 \
--resp_max_length 183 \
--prompt_max_length 200 \
--entity_max_length 32 \
--learning_rate 1e-4 \
--output_dir UniCRS/src/result_convprompt_inspired \
--use_wandb \
--project crs-prompt-conv-inspired \
--name exp1 \
--gpu 0
conv training
python3 infer_conv.py \
--dataset inspired \
--split test \
--tokenizer microsoft/DialoGPT-small \
--model microsoft/DialoGPT-small \
--text_tokenizer roberta-base \
--text_encoder roberta-base \
--n_prefix_conv 20 \
--prompt_encoder UniCRS/src/result_convprompt_inspired/best \
--per_device_eval_batch_size 64 \
--context_max_length 200 \
--resp_max_length 183 \
--prompt_max_length 200 \
--entity_max_length 32 \
--gpu 1
conv infer
python3 train_rec.py \
--dataset inspired_gen \
--tokenizer microsoft/DialoGPT-small \
--model microsoft/DialoGPT-small \
--text_tokenizer roberta-base \
--text_encoder roberta-base \
--n_prefix_rec 10 \
--prompt_encoder UniCRS/src/result_promptpretraining_inspired/best \
--num_train_epochs 5 \
--per_device_train_batch_size 64 \
--per_device_eval_batch_size 64 \
--gradient_accumulation_steps 1 \
--num_warmup_steps 33 \
--context_max_length 200 \
--prompt_max_length 200 \
--entity_max_length 32 \
--learning_rate 1e-4 \
--output_dir UniCRS/src/result_rec_inspired \
--use_wandb \
--project crs-prompt-rec-inspired \
--name exp1 \
--gpu 0
rec training
Thank you!
Dear Author,
I am trying to reproduce the rec performance on INSPIRED dataset.
I use the hyperparameters you recommend and the "best" model as prompt-encoder. Unfortunately, I was not able to reproduce the performance on the paper.
---- Here I attached the loss and recall@1 on testset for prompt pre-training, conversational training, and recommendation training steps:


prompt pre-training
conversational training
recommendation training (as you can see, the best recall@1 I got is around 0.04, far from 0.09)
---- and here are the configuration I use for prompt pre-training, conversational training, and recommendation training steps:
prompt pre-training
conv training
conv infer
rec training
Thank you!