Draft: Implements Encoder-Decoder Attention Model#28
Conversation
|
I would recommend to build full setups before merging this to avoid the same problems we had with other code that we merged but never "used" before, I would help with this. |
Yeah I agree. |
44b9cdc to
6a6e044
Compare
Allows to pass the label unshifted for step-wise search without needing a separate function besides "forward".
b26ed0a to
6a147b7
Compare
|
I completely forgot to test with this branch again after using it already for some time, did so now and it works normally. |
Co-authored-by: Benedikt Hilmes <hilmes@hltpr.rwth-aachen.de>
| energies = v^T * tanh(h + s + beta) where beta is weight feedback information | ||
| weights = softmax(energies) | ||
| context = sum_t weights_t * h_t |
There was a problem hiding this comment.
The symbols in this docstring are partly undefined/different to the parameter names in forward. It would be easier to understand if the naming was unified.
| :param shift_embeddings: shift the embeddings by one position along U, padding with zero in front and drop last | ||
| training: this should be "True", in order to start with a zero target embedding | ||
| search: use True for the first step in order to start with a zero embedding, False otherwise |
There was a problem hiding this comment.
I'm not a fan of this shift_embeddings logic. I would rather handle this externally by prepending a begin-token to labels or using the begin-token in the first search step. If the embedding must be an all-zero vector this could be achieved via the padding_idx parameter in torch.nn.Embedding.
| :param shift_embeddings: shift the embeddings by one position along U, padding with zero in front and drop last | ||
| training: this should be "True", in order to start with a zero target embedding | ||
| search: use True for the first step in order to start with a zero embedding, False otherwise | ||
| """ |
There was a problem hiding this comment.
Docs for the return values are missing.
| training: labels of shape [B,N] | ||
| (greedy-)search: hypotheses last label as [B,1] | ||
| :param enc_seq_len: encoder sequence lengths of shape [B,T], same for training and search | ||
| :param state: decoder state |
There was a problem hiding this comment.
Shape info for state tensors is missing.
| def forward( | ||
| self, inputs: torch.Tensor, state: Tuple[torch.Tensor, torch.Tensor] | ||
| ) -> Tuple[torch.Tensor, torch.Tensor]: | ||
| with torch.autocast(device_type="cuda", enabled=False): |
There was a problem hiding this comment.
Why is this disabled here? That should be explained in the code, maybe with some ref.
No description provided.