Skip to content

reproducing training and validation data. #22

@darkbeforedawn

Description

@darkbeforedawn

Greetings, i would like to inquire that you mention in your paper:

Implementation Details. To train our model, 128 training
and 32 validation frames are used. RetinaNet [26] is used
to crop faces with a conservative enlargement (by a factor
of 1.25) around the face center. Note that all the cropped
images are then resized to 384 × 384. In addition, 68 fa-
cial landmarks are extracted per frame using Dlib [21]. We
adopt the EFNB4 variant of the EfficientNet [44] pretrained
on ImageNet [12]. For each training epoch, 8 frames are
dynamically selected and used for online pseudo-fake gen-eration.

Does this mean that you extract 128 frames per video for training and 32 frames per video for validation which do not overlap? furthermore, does dynamically selecting 8 frames mean you randomly select 8 frames from the 128 frames for every training epoch that may or may not overlap during successive training epochs? Can you also point out the testing strategy for cross manipulation and cross-dataset evaluation as well?
Many thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions