-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Thank you for the great work. NMTKit is easy to follow and helps me for understanding the use of DyNet.
I'm wondering initialization of states in a decoder by the final states of an encoder.
https://github.com/odashi/nmtkit/blob/master/nmtkit/luong_decoder.cc#L51
Decoder::State LuongDecoder::prepare(
const vector<DE::Expression> & seed,
dynet::ComputationGraph * cg,
const bool is_training) {
NMTKIT_CHECK_EQ(2 * num_layers_, seed.size(), "Invalid number of initial states.");
vector<DE::Expression> states;
for (unsigned i = 0; i < num_layers_; ++i) {
enc2dec_[i].prepare(cg);
states.emplace_back(enc2dec_[i].compute(seed[i]));
}
for (unsigned i = 0; i < num_layers_; ++i) {
states.emplace_back(DE::tanh(states[i]));
}
rnn_.set_dropout(is_training ? dropout_rate_ : 0.0f);
rnn_.new_graph(*cg);
rnn_.start_new_sequence(states);
dec2out_.prepare(cg);
// Zero vector for the initial feeding value.
const DE::Expression init_feed = DE::input(
*cg, {out_embed_size_}, vector<float>(out_embed_size_, 0.0f));
return {{rnn_.state()}, {init_feed}};
}In my understanding, seed is the final state (cell and hidden state) of an encoder obtained by the encoder's getStates(). When calculating states of 2 * num_layers_ in the above function, the last num_layers_ elements of states are obtained by using the first num_layers_ elements of it because of the following:
for (unsigned i = 0; i < num_layers_; ++i) {
enc2dec_[i].prepare(cg);
states.emplace_back(enc2dec_[i].compute(seed[i]));
}
for (unsigned i = 0; i < num_layers_; ++i) {
states.emplace_back(DE::tanh(states[i]));
}I guess all elements of states are calculated by using only cell in an encoder but not cell and hidden state, for example, states = {c_1, c_2, ..., c_n, tanh(c_1), tanh(c_2), ..., tanh(c_n)}.
Do you have any reason instead of using the following example?
for (unsigned i = 0; i < num_layers_; ++i) {
enc2dec_[i].prepare(cg);
states.emplace_back(enc2dec_[i].compute(seed[i]));
}
for (unsigned i = 0; i < num_layers_; ++i) {
states.emplace_back(DE::tanh(seed[num_layers_ + i])); // dimension of seed[num_layers_ + i] will need to be reduced if encoder is bidirectional
}