-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
hi,thankyou for release code!
I have a question about the different pipline between train and inference 。the paper says that in inference stage the predict out of every decoder layer is fed to the next layer 。But in train stage there need to have embedding 、concat and linear op to generate new tensor which is fed to the next layer 。What would be the impact of this operation?
Metadata
Metadata
Assignees
Labels
No labels