Skip to content

Initial experiment: soft optim vs normal #16

@jezgillen

Description

@jezgillen
  • Run the train loop
  • Run inference to get e.g. 1000 inference samples (e.g. with proxy reward and prior reward)
  • Get the proxy value cut-off using get_proxy_value_cutoff
  • Run model twice: with soft-optim likelihood + tiny KL , and normal likelihood + tiny KL .
  • Show the "breaks rules" stats for both runs

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions