question on training scheme

It seems that the network doesn't use the previous hidden state in training phase:
```
with autocast(enabled=not self.args.disable_mixed_precision):
            pred_fgr, pred_pha = self.model_ddp(true_src, downsample_ratio=downsample_ratio)[:2]
            loss = matting_loss(pred_fgr, pred_pha, true_fgr, true_pha)

self.scaler.scale(loss['total']).backward()
self.scaler.step(self.optimizer)
self.scaler.update()
self.optimizer.zero_grad()
```
But it is fed into the network in the test phase.

```
src = src.to(device, dtype, non_blocking=True).unsqueeze(0) # [B, T, C, H, W]
fgr, pha, *rec = model(src, *rec, downsample_ratio)
```
Why does the network use different feedforward scheme in these two stage.  Will it be better to take the hidden state as input during training stage?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question on training scheme #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

question on training scheme #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions