some questions about the process

hi thanks for the amazing work and code!

i have 2 questions actually...

1. since we are going to ask model to judge which prompt is more likely from human, means we need to output a yes/no then followed by a synthetic output y' ?
2. does the iteration actually means we need to firstly save all the generated outputs from iteration 0, then use them for iteration 1 and so on?