Initialization of decoder's states

Thank you for the great work. NMTKit is easy to follow and helps me for understanding the use of DyNet.

I'm wondering initialization of states in a decoder by the final states of an encoder.

https://github.com/odashi/nmtkit/blob/master/nmtkit/luong_decoder.cc#L51

```c++
Decoder::State LuongDecoder::prepare(
    const vector<DE::Expression> & seed,
    dynet::ComputationGraph * cg,
    const bool is_training) {
  NMTKIT_CHECK_EQ(2 * num_layers_, seed.size(), "Invalid number of initial states.");
  vector<DE::Expression> states;
  for (unsigned i = 0; i < num_layers_; ++i) {
    enc2dec_[i].prepare(cg);
    states.emplace_back(enc2dec_[i].compute(seed[i]));
  }
  for (unsigned i = 0; i < num_layers_; ++i) {
    states.emplace_back(DE::tanh(states[i]));
  }
  rnn_.set_dropout(is_training ? dropout_rate_ : 0.0f);
  rnn_.new_graph(*cg);
  rnn_.start_new_sequence(states);
  dec2out_.prepare(cg);
  // Zero vector for the initial feeding value.
  const DE::Expression init_feed = DE::input(
      *cg, {out_embed_size_}, vector<float>(out_embed_size_, 0.0f));
  return {{rnn_.state()}, {init_feed}};
}
```

In my understanding, `seed` is the final state (cell and hidden state) of an encoder obtained by the encoder's `getStates()`. When calculating `states` of `2 * num_layers_` in the above function, the last `num_layers_` elements of `states` are obtained by using the first `num_layers_` elements of it because of the following:

```c++
  for (unsigned i = 0; i < num_layers_; ++i) {
    enc2dec_[i].prepare(cg);
    states.emplace_back(enc2dec_[i].compute(seed[i]));
  }
  for (unsigned i = 0; i < num_layers_; ++i) {
    states.emplace_back(DE::tanh(states[i]));
  }
```
I guess all elements of `states` are calculated by using only cell in an encoder but not cell and hidden state, for example, `states = {c_1, c_2, ..., c_n, tanh(c_1), tanh(c_2), ..., tanh(c_n)}`.
Do you have any reason instead of using the following example?

```c++
  for (unsigned i = 0; i < num_layers_; ++i) {
    enc2dec_[i].prepare(cg);
    states.emplace_back(enc2dec_[i].compute(seed[i]));
  }
  for (unsigned i = 0; i < num_layers_; ++i) {
    states.emplace_back(DE::tanh(seed[num_layers_ + i])); // dimension of seed[num_layers_ + i] will need to be reduced if encoder is bidirectional
  }
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialization of decoder's states #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Initialization of decoder's states #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions