We don't have to encode_text in each denoise timestep (but compute them at the beginning), do we?

https://github.com/GuyTevet/motion-diffusion-model/blob/8139dda55d90a58aa5a257ebf159b2ecfb78c632/model/mdm.py#L151C8-L151C8
```
class MDM(nn.Module):
......
    def forward(self, x, timesteps, y=None):
        """
        x: [batch_size, njoints, nfeats, max_frames], denoted x_t in the paper
        timesteps: [batch_size] (int)
        """
        bs, njoints, nfeats, nframes = x.shape
        emb = self.embed_timestep(timesteps)  # [1, bs, d]

        force_mask = y.get('uncond', False)
        if 'text' in self.cond_mode:
            enc_text = self.encode_text(y['text'])
            emb += self.embed_text(self.mask_cond(enc_text, force_mask=force_mask))
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

We don't have to encode_text in each denoise timestep (but compute them at the beginning), do we? #151

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

We don't have to encode_text in each denoise timestep (but compute them at the beginning), do we? #151

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions