I saw in the C++ code you use four loss functions: mse and mae for predicted time; error count and NLL for predicted mark label.
But in the paper the loss function was written as the NLL + log of density function.
Is this code actually for another paper?