Hi, thank you for your nice work. I trained the model for "sparse scribble adapter", but result seems bad. What might be the reason?     first gif is ground truth and second gif is generated result.