can i ask you 2 questions?
- in the train phase, as we know, there is a surgery image as input(x0); then we add noises on it, and get xT ; finally we denoise xT, and get x_0_pred; i wanna know if the x0 and x0_pred(or its corresponding ground truth image) are the same image?
- in the prediction phase, is there an extra surgery image input, or just generate image out of text(surgery phase and tool text)?