Keras/Tensorflow implementation of the SSD
The base source code is placed in the ssd dir. Currently SSD300 NN based on VGG16 model is implemented - see ssd300_vgg16.py.
The SSD300-VGG16 is trained on Pascal VOC 2007+2012 dataset. Initially Ground-Truth Boxes are fetched from Pascal VOC dataset and stored as a hashtable. The keys of the hashtable are filenames, values are numpy arrays containing normalized bounding boxes, one-hot-encoded classes and difficulty property: [xmin, ymin, xmax, ymax, one-hot-encoded-class, is-difficult]. Ground-Truth Boxes are stored in the following pickle files:
- pascal_voc_2007_test.p (see http://host.robots.ox.ac.uk/pascal/VOC/voc2007/#testdata)
- pascal_voc_2007_trainval.p (see http://host.robots.ox.ac.uk/pascal/VOC/voc2007/#devkit)
- pascal_voc_2012_trainval.p (see http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit)
PriorBoxes are generated as in the origin Caffe implementation.
PriorBoxes.ipynb contains samples of how PriorBoxes might look.
See DataAugmentation.ipynb for the whole process samples. See Imaging.ipynb for photo-metric distortion samples.
Mining hard examples is implemented in the SsdLoss class
SSD300-VGG16.ipynb contains the training process.
This implementation has a lower performance comparing to the original Caffe implementation: overall mAP = 66%
| Class | AP (%) |
|---|---|
| aeroplane | 76 |
| bicycle | 76 |
| bird | 66 |
| boat | 63 |
| bottle | 39 |
| bus | 71 |
| car | 80 |
| cat | 80 |
| chair | 36 |
| cow | 59 |
| diningtable | 64 |
| dog | 70 |
| horse | 75 |
| motorbike | 72 |
| person | 70 |
| pottedplant | 44 |
| sheep | 56 |
| sofa | 71 |
| train | 76 |
| tvmonitor | 68 |
See Anaconda env file for dependencies.
To create env use:
conda env create -f anaconda-dl-env.yml