For this task, the input data (for inference) would be an image, and the output data should be a list
like of tuples: (bounding box, probability of each category).
Pls refer to https://github.com/shwars/mPyPl/wiki/Reading-PASCAL-VOC-Format
Discussion is welcomed.