Skip to content

We don't have to encode_text in each denoise timestep (but compute them at the beginning), do we? #151

@RalphHan

Description

@RalphHan

https://github.com/GuyTevet/motion-diffusion-model/blob/8139dda55d90a58aa5a257ebf159b2ecfb78c632/model/mdm.py#L151C8-L151C8

class MDM(nn.Module):
......
    def forward(self, x, timesteps, y=None):
        """
        x: [batch_size, njoints, nfeats, max_frames], denoted x_t in the paper
        timesteps: [batch_size] (int)
        """
        bs, njoints, nfeats, nframes = x.shape
        emb = self.embed_timestep(timesteps)  # [1, bs, d]

        force_mask = y.get('uncond', False)
        if 'text' in self.cond_mode:
            enc_text = self.encode_text(y['text'])
            emb += self.embed_text(self.mask_cond(enc_text, force_mask=force_mask))

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions