Draft: Implements Encoder-Decoder Attention Model#28

Open

mmz33 wants to merge 19 commits intomainfrom

zeineldeen_att_decoder

Member

mmz33 commented Jul 26, 2023

No description provided.

mmz33 requested review from JackTemaki, albertz and curufinwe

July 26, 2023 17:30

Contributor

JackTemaki commented Jul 27, 2023

I would recommend to build full setups before merging this to avoid the same problems we had with other code that we merged but never "used" before, I would help with this.

Member Author

mmz33 commented Jul 27, 2023

I would recommend to build full setups before merging this to avoid the same problems we had with other code that we merged but never "used" before, I would help with this.

Yeah I agree.

mmz33 requested a review from Atticus1806

July 31, 2023 15:05

JackTemaki force-pushed the zeineldeen_att_decoder branch 2 times, most recently from 44b9cdc to 6a6e044 Compare

August 2, 2023 09:34

curufinwe changed the title ~~Implements Encoder-Decoder Attention Model~~ Draft: Implements Encoder-Decoder Attention Model

curufinwe marked this pull request as draft

August 3, 2023 09:31

mmz33 and others added 17 commits

October 15, 2024 11:29


          implemented enc-dec-att model

3476c07


          fix docs

b06aac0


          fix docs

3f5ada0


          better check

4378fb9


          better comment

085f9f3


          better comment

523ce6c


          refactor + implement zoneout

396a664


          implement zoneout lstm cell

9c22aa2

fix

c4d5710


          put tensors on cuda

cd366ab


          make device configurable

d0ed59b


          check for cuda availability

5a78e40


          remove explicit device, fix return value

163315a


          remove device from attention test

6d200eb


          remove leftover print statement

1b8dd52


          Add shift_embeddings flag for Attention Decoder

dcf0381

Allows to pass the label unshifted for step-wise search
without needing a separate function besides "forward".


          fix logits dropout bug

6a147b7

JackTemaki force-pushed the zeineldeen_att_decoder branch from b26ed0a to 6a147b7 Compare

October 15, 2024 09:44

JackTemaki approved these changes

View reviewed changes

Contributor

JackTemaki commented Oct 21, 2024

I completely forgot to test with this branch again after using it already for some time, did so now and it works normally.

JackTemaki marked this pull request as ready for review

October 21, 2024 18:43

Atticus1806 reviewed

View reviewed changes

i6_models/decoder/attention.py Outdated Show resolved Hide resolved

i6_models/decoder/zoneout_lstm.py Show resolved Hide resolved


          Update i6_models/decoder/attention.py

a988e85

Co-authored-by: Benedikt Hilmes <hilmes@hltpr.rwth-aachen.de>

Atticus1806 approved these changes

View reviewed changes

SimBe195 requested changes

View reviewed changes

i6_models/decoder/attention.py

Comment on lines +25 to +27

+                      energies = v^T * tanh(h + s + beta)  where beta is weight feedback information
+                      weights = softmax(energies)
+                      context = sum_t weights_t * h_t

Contributor

SimBe195 Oct 24, 2024

The symbols in this docstring are partly undefined/different to the parameter names in forward. It would be easier to understand if the naming was unified.

i6_models/decoder/attention.py

Comment on lines +146 to +148

+                      :param shift_embeddings: shift the embeddings by one position along U, padding with zero in front and drop last
+                          training: this should be "True", in order to start with a zero target embedding
+                          search: use True for the first step in order to start with a zero embedding, False otherwise

Contributor

SimBe195 Oct 24, 2024

I'm not a fan of this shift_embeddings logic. I would rather handle this externally by prepending a begin-token to labels or using the begin-token in the first search step. If the embedding must be an all-zero vector this could be achieved via the padding_idx parameter in torch.nn.Embedding.

i6_models/decoder/attention.py

+                      :param shift_embeddings: shift the embeddings by one position along U, padding with zero in front and drop last
+                          training: this should be "True", in order to start with a zero target embedding
+                          search: use True for the first step in order to start with a zero embedding, False otherwise
+                      """

Contributor

SimBe195 Oct 24, 2024

Docs for the return values are missing.

i6_models/decoder/attention.py

+                          training: labels of shape [B,N]
+                          (greedy-)search: hypotheses last label as [B,1]
+                      :param enc_seq_len: encoder sequence lengths of shape [B,T], same for training and search
+                      :param state: decoder state

Contributor

SimBe195 Oct 24, 2024

Shape info for state tensors is missing.


          Disable autocast for ZoneoutLSTM cell

bb8fa4e

albertz reviewed

View reviewed changes

i6_models/decoder/zoneout_lstm.py

+                  def forward(
+                      self, inputs: torch.Tensor, state: Tuple[torch.Tensor, torch.Tensor]
+                  ) -> Tuple[torch.Tensor, torch.Tensor]:
+                      with torch.autocast(device_type="cuda", enabled=False):

Member

albertz Oct 26, 2024

Why is this disabled here? That should be explained in the code, maybe with some ref.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet