Separating out rewards during training [Lower Priority]

Allows us to see on which datasets the models learn to obfuscate / reward hack.

Useful for training debugging and interpreting results of evals.