Fix NotImplemented errors, xformers attention shape, and missing text conditioning#140
Open
Mr-Neutr0n wants to merge 1 commit intoVchitect:mainfrom
Open
Conversation
… conditioning - Replace `raise NotImplemented` with `raise NotImplementedError` in both latte.py and latte_img.py. `NotImplemented` is not an exception class and will raise a TypeError instead of the intended error. - Transpose q, k, v from (B, heads, N, dim) to (B, N, heads, dim) before calling xformers memory_efficient_attention in latte_img.py, matching the correct implementation in latte.py. xformers expects the (B, N, heads, dim) layout. - Add missing `elif self.extras == 78` branch before the final layer in latte.py so that text_embedding_spatial conditioning is applied during the final adaptive layer norm, consistent with the temporal blocks above.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
raise NotImplemented->raise NotImplementedErrorinmodels/latte.py(line 73) andmodels/latte_img.py(line 76).NotImplementedis a special singleton used for binary operator fallbacks, not an exception class. Using it withraiseproduces aTypeErrorinstead of the intended error, masking the real issue.Transpose q, k, v for xformers attention in
models/latte_img.py(line 61). Afterpermute(2, 0, 3, 1, 4)andunbind, tensors are shaped(B, heads, N, dim), butxformers.ops.memory_efficient_attentionexpects(B, N, heads, dim). The implementation inmodels/latte.py(lines 55-58) correctly transposes before calling xformers; this patch applies the same fix tolatte_img.py.Add missing
elif self.extras == 78before final layer inmodels/latte.py(line 372). The temporal block loop correctly handlesextras == 78by addingtext_embedding_tempto the conditioning, but the final adaptive layer norm block only checked forextras == 2(class conditioning) and fell through to unconditional for all other values. This meant text-conditioned generation (extras == 78) silently dropped text conditioning at the final layer.Test plan
raise NotImplementedErroris correctly raised when an unsupported attention mode is passedlatte_img.pyand confirm no shape errors