- [ ] Run the train loop - [ ] Run inference to get e.g. 1000 inference samples (e.g. with proxy reward and prior reward) - [ ] Get the proxy value cut-off using `get_proxy_value_cutoff` - [ ] Run model twice: with soft-optim likelihood + tiny KL , and normal likelihood + tiny KL . - [ ] Show the "breaks rules" stats for both runs
get_proxy_value_cutoff