How to evaluate the model in terms of semantic segmentation metrics? Or how to output semantic segmentation metrics and logs when evaluating the model