Skip to content

Regarding the reward curves after training convergence #2

@HHT0003

Description

@HHT0003

Specifically, I would like to understand what the reward curves should look like once training has converged, and what the expected converged reward values are for each of the three datasets provided in the README.

In my experiments, when training with the 100style and lafan1 datasets, the final reward values after convergence are relatively low—only reaching about 70% to 80% of the defined reward scale. I am unsure whether this behavior is expected or if it indicates an issue with my training setup.

Could you please clarify what reward levels are typically achieved after convergence for each of the three datasets?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions