Skip to content

Initialization of decoder's states #15

@tma15

Description

@tma15

Thank you for the great work. NMTKit is easy to follow and helps me for understanding the use of DyNet.

I'm wondering initialization of states in a decoder by the final states of an encoder.

https://github.com/odashi/nmtkit/blob/master/nmtkit/luong_decoder.cc#L51

Decoder::State LuongDecoder::prepare(
    const vector<DE::Expression> & seed,
    dynet::ComputationGraph * cg,
    const bool is_training) {
  NMTKIT_CHECK_EQ(2 * num_layers_, seed.size(), "Invalid number of initial states.");
  vector<DE::Expression> states;
  for (unsigned i = 0; i < num_layers_; ++i) {
    enc2dec_[i].prepare(cg);
    states.emplace_back(enc2dec_[i].compute(seed[i]));
  }
  for (unsigned i = 0; i < num_layers_; ++i) {
    states.emplace_back(DE::tanh(states[i]));
  }
  rnn_.set_dropout(is_training ? dropout_rate_ : 0.0f);
  rnn_.new_graph(*cg);
  rnn_.start_new_sequence(states);
  dec2out_.prepare(cg);
  // Zero vector for the initial feeding value.
  const DE::Expression init_feed = DE::input(
      *cg, {out_embed_size_}, vector<float>(out_embed_size_, 0.0f));
  return {{rnn_.state()}, {init_feed}};
}

In my understanding, seed is the final state (cell and hidden state) of an encoder obtained by the encoder's getStates(). When calculating states of 2 * num_layers_ in the above function, the last num_layers_ elements of states are obtained by using the first num_layers_ elements of it because of the following:

  for (unsigned i = 0; i < num_layers_; ++i) {
    enc2dec_[i].prepare(cg);
    states.emplace_back(enc2dec_[i].compute(seed[i]));
  }
  for (unsigned i = 0; i < num_layers_; ++i) {
    states.emplace_back(DE::tanh(states[i]));
  }

I guess all elements of states are calculated by using only cell in an encoder but not cell and hidden state, for example, states = {c_1, c_2, ..., c_n, tanh(c_1), tanh(c_2), ..., tanh(c_n)}.
Do you have any reason instead of using the following example?

  for (unsigned i = 0; i < num_layers_; ++i) {
    enc2dec_[i].prepare(cg);
    states.emplace_back(enc2dec_[i].compute(seed[i]));
  }
  for (unsigned i = 0; i < num_layers_; ++i) {
    states.emplace_back(DE::tanh(seed[num_layers_ + i])); // dimension of seed[num_layers_ + i] will need to be reduced if encoder is bidirectional
  }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions