since the concatenation we described above to compute m_a and m_b can significantly increase the overall parameter size to potentially overfit the models.
Befrore feed the $m_a$ or $m_b$ into BiLSTM, they have reduce the dimension of these concated vectors which mentioned in the originial paper.