Confidence scores are too low despite while using deepfashion2 validation set

I've used the YOLOv3 model for Deepfashion2 with the validation set given by deepfashion2 and my scores are so low despite using 5,000 and 10,000 images. The mean average precision is low as well. Can I ask how did you come up with your own result for coco evaluation?