I've used the YOLOv3 model for Deepfashion2 with the validation set given by deepfashion2 and my scores are so low despite using 5,000 and 10,000 images. The mean average precision is low as well. Can I ask how did you come up with your own result for coco evaluation?