Allows us to see on which datasets the models learn to obfuscate / reward hack. Useful for training debugging and interpreting results of evals.