When I try to reproduce the experiments results in the paper, I cannot get a approximate result as in the paper. And I want to know why you choose this neural network structure, and I think you might have a misunderstanding on multi-task machine learning...