-
Notifications
You must be signed in to change notification settings - Fork 60
Open
Description
First of all, thanks for your open-source code of this wonderful work.
I also have some questions about your code of reinforcement learning. I found that in your version of reinforcement learning, you use the training dataset for policy gradient to fine-tuning parameters.
But actually, in my opinion, a user simulator should be used as the environment for updating the parameters in RL setup. Can you tell me the reason?
Thank you very much !
Metadata
Metadata
Assignees
Labels
No labels