I follow the settings in 4.1. Implementation Details and try to train a detector on FF+ raw with only real faces and self-blend fake faces. However when testing on FF+ raw it can only perform good on Deepfakes(138/140 correct), the results on NeuralTextures(≈50% acc), Face2Face(≈60% acc) and Faceswape(≈60% acc). I also tried the released pretrain weights to test on FF++ raw and everything goes well with the data in paper. Such problem makes me really confused, have anyone meet similar issue?