Regarding the reward curves after training convergence

Specifically, I would like to understand what the reward curves should look like once training has converged, and what the expected converged reward values are for each of the three datasets provided in the README.

In my experiments, when training with the 100style and lafan1 datasets, the final reward values after convergence are relatively low—only reaching about 70% to 80% of the defined reward scale. I am unsure whether this behavior is expected or if it indicates an issue with my training setup.

Could you please clarify what reward levels are typically achieved after convergence for each of the three datasets?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding the reward curves after training convergence #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Regarding the reward curves after training convergence #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions