-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Greetings, i would like to inquire that you mention in your paper:
Implementation Details. To train our model, 128 training
and 32 validation frames are used. RetinaNet [26] is used
to crop faces with a conservative enlargement (by a factor
of 1.25) around the face center. Note that all the cropped
images are then resized to 384 × 384. In addition, 68 fa-
cial landmarks are extracted per frame using Dlib [21]. We
adopt the EFNB4 variant of the EfficientNet [44] pretrained
on ImageNet [12]. For each training epoch, 8 frames are
dynamically selected and used for online pseudo-fake gen-eration.
Does this mean that you extract 128 frames per video for training and 32 frames per video for validation which do not overlap? furthermore, does dynamically selecting 8 frames mean you randomly select 8 frames from the 128 frames for every training epoch that may or may not overlap during successive training epochs? Can you also point out the testing strategy for cross manipulation and cross-dataset evaluation as well?
Many thanks