-
Notifications
You must be signed in to change notification settings - Fork 22
question on training scheme #1
Copy link
Copy link
Open
Description
It seems that the network doesn't use the previous hidden state in training phase:
with autocast(enabled=not self.args.disable_mixed_precision):
pred_fgr, pred_pha = self.model_ddp(true_src, downsample_ratio=downsample_ratio)[:2]
loss = matting_loss(pred_fgr, pred_pha, true_fgr, true_pha)
self.scaler.scale(loss['total']).backward()
self.scaler.step(self.optimizer)
self.scaler.update()
self.optimizer.zero_grad()
But it is fed into the network in the test phase.
src = src.to(device, dtype, non_blocking=True).unsqueeze(0) # [B, T, C, H, W]
fgr, pha, *rec = model(src, *rec, downsample_ratio)
Why does the network use different feedforward scheme in these two stage. Will it be better to take the hidden state as input during training stage?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels