-
Notifications
You must be signed in to change notification settings - Fork 1
Agent Detection
Code: https://github.com/WATonomous/mmdetection
- clone the code and checkout to road branch
- create .env file and add
COMPOSE_PROJECT_NAME=your_user_namein it. - Run
docker-compose up mmdetto start the docker - Run
docker exec -it your_user_name_mmdet_1 /bin/bash - Run
pip install -v -e . -
python tools/train.py configs/road/fpn_r50_config.pyto train the FPN ResNet-50 detection model on ROAD dataset.
/mnt/wato-drive/road/rgb-images/
- /mnt/wato-drive/road/detections/coco_annotation_train1_full.json for the train1 split training set.
- /mnt/wato-drive/road/detections/coco_annotation_train1_quarter.json for uniformly sampled 1/4 training set (currently using).
- /mnt/wato-drive/road/detections/coco_annotation_val1.json for val1 split validation set.
Code to converted from ROAD annotation format to COCO format can be found at https://github.com/WATonomous/mmdetection/blob/road/tools/road/convert_road_gt_to_coco.py
| Detector | AP@0.5-0.95 | AP@0.5 |
|---|---|---|
| ResNet-50 FPN ImageNet finetune | 30.0 | 56.2 |
| ResNet-50 FPN COCO finetune | 34.8 | 59 |
Public Results
- YOLOv5 on validation set from paper ROAD: The ROad event Awareness Dataset for Autonomous Driving 57.9%
- train1
{'ztotal': 416574, 'LarVeh': 5560, 'Cyc': 46460, 'Ped': 179591, 'Car': 92894, 'MedVeh': 15373, 'Bus': 10161, 'TL': 48329, 'Mobike': 1017, 'OthTL': 16837, 'EmVeh': 352} - val1
{'ztotal': 60103, 'Ped': 18949, 'Bus': 1834, 'Car': 11032, 'TL': 7690, 'MedVeh': 9895, 'Cyc': 6139, 'LarVeh': 633, 'Mobike': 2121, 'OthTL': 1810}
There is no 'EmVeh' class in val1
- Generate the predictions of ROAD training dataset with COCO pre-trained models and filter the predicted boxes with class labels and detection scores.
- Compute the overlap of the predictons from step 1 and ROAD dataset annotations of the same class and perform Non-maximum supression.
- Filter invalid predictions and generate new annotations of inactive agents.
- Training new object detection models with the new dataset of 11 classes (10 previous classes of agents + inactive agents)
Code to generate the inactive agent annotations can be found at: https://github.com/WATonomous/mmdetection/blob/road/tools/road/generate_inactive_annotation.py
New annotations for the inactive agents can be found at /mnt/wato-drive/road/detections/coco_annotation_train1_quarter_inactive.json
| Detector | AP@0.5-0.95 | AP@0.5 |
|---|---|---|
| ResNet-50 FPN ImageNet finetune | 30.9 | 61.3 |
| ResNet-50 FPN COCO finetune | 38.0 | 67.9 |
Merge certain classes of agents into a single super-category.
- Merge Car, MedVeh, LarVeh, Bus and EmVeh to a single class Vehicle
- Merge TL, OthTL to a single class TL
- 10 classes are merged into 5 new classes
Code for super-category agent class merge can be found at https://github.com/WATonomous/mmdetection/blob/road/tools/road/super_category_merge.py
New annotations for the inactive agents and super-category can be found at /mnt/wato-drive/road/detections/coco_annotation_train1_quarter_inactive_merged.json
| Detection | detection mAP | COCO finetune |
|---|---|---|
| RGB only (ImageNet-pretrain) | 56.2 | 59.0 |
| RGB + optical flow (x,y 2 channel) shallow fusion | 52.6 | 53.8 |
| RGB + optical flow (color 3 channel) shallow fusion | 52.2 | 53.6 |
| RGB + optical flow (magnitude 1 channel) shallow fusion | 51.3 | 52.8 |
| RGB + optical flow (color 3 channel) deep fusion addition | 59.9 | 61.4 |
| RGB + optical flow (color 3 channel) deep fusion concat | 59.1 | 61.2 |
Combined with other improvements:
| Detection | detection mAP |
|---|---|
| RGB only (COCO pretrain) + inactive + class merge | 80.4 |
| RGB + optical flow (color 3 channel) deep fusion addition + inactive + class merge | 81.3 |
Larger backbone x101_64dx4:
| Detection | detection mAP |
|---|---|
| RGB only (COCO pretrain) + inactive + class merge | 80.5 |
| RGB + optical flow (color 3 channel) deep fusion addition + inactive + class merge | 83.1 |
1-channel: magnitude of x and y direction. sqrt(x^2 + y^2)
2-channel: magnitude in x and y direction.
3-channel: color wheel representation
shallow fusion: concat rgb and optical flow as input
Inactive agents has been improved from the visualization. But false positives increased since no ImageNet pre-trained model can be used on the new fusion input.
Code:
(Optical Flow) https://github.com/liqq010/RAFT/blob/master/get_flow_for_road.py
(Normalized Optical Flow) https://github.com/liqq010/RAFT/blob/master/get_flow_for_road_norm.py
Normalization Steps:
- First set a range for optical flow, -15 to 15 default. Clip the value out of bound.
- Then normalize the value of optical flow to 0 to 255.
Where to find the generated optical flow:
- 3-channel color wheel: /mnt/wato-drive/road/optical_flow_color_wheel
- 2-channel normalized flow: /mnt/wato-drive/road/optical_flow_normalized
Prepare a json file with the format of following:
The first level of the json file contains the field db
The db field contains all frame level detections for all videos of validation:
- To access detection for a video, use
db['2014-06-25-16-45-34_stereo_centre_02']where2014-06-25-16-45-34_stereo_centre_02is name of the video. - Each video detection comes with the follwoing fields:
['frames', 'numf', 'split_ids']-
split_idscontains the split id assigned this videos out oftest,train1,val1, ... -
numfis the number of frames in the video -
framescontains frame-level detection results- for each frame, frame['1'] contains
['annotated', 'width', 'height', 'annos', 'input_image_id'] -
annotatedis set to 1 always. -
annos: contains detections of a frame with bounding boxes with unique keys. -
annos['1']has following keys['box', 'agent_ids', 'action_ids']-
boxis normalized (0,1) bounding box coordinate withxmin, ymin, xmax, ymax -
agent_idsis the class ids of detection classes -
action_idsis set to 1 as a fake id which will not be used in evaluation.
-
- for each frame, frame['1'] contains
-
Set the json file for the field annotation_path and set the evaluation to True in the config file to evaluate. An example json file is provided at /mnt/wato-drive/road/detections/new_val1_coco.json
- Run trained detection model and save results to xxx.pkl.
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}]
- Convert pkl results to an intermediate format file
python tools/road/convert_to_acar.py
- Convert intermediate format to format acar-net needed
python tools/road/convert_to_gt.py
Need to define the threshold of score of bounding box. In general 0.7 is best.
| Detection | detection mAP | action mAP frame-level | model path | detection file |
|---|---|---|---|---|
| Ground-truth | - | 34.678 | ||
| Center-Net (last year) (10 class) | 62 | 23.546 | ||
| Inactive agent detection (10 class) | 67.9 | 23.837 | ||
| Inactive agent detection + super-category class merged (6 class) | 80.5 | 24.940 | link | link |
| Inactive agent detection + super-category class merged (6 class) + deep fusion optical flow | 83.1 | 26.407 | link | link |
On val2:
| Detection | detection mAP(mAP@0.5-0.95 / mAP@0.5) | action mAP frame-level | model path | detection file |
|---|---|---|---|---|
| FPN_x101_64dx4_inactive_merged | 11.5 / 28.4 (e1) | |||
| +deep fusion | 14.4 / 33.4 (e1) |
On val3:
| Detection | detection mAP(mAP@0.5-0.95 / mAP@0.5) | action mAP frame-level | model path | detection file |
|---|---|---|---|---|
| FPN_x101_64dx4_inactive_merged | 34.4 / 61.6 (e5) | |||
| +deep fusion | 38.1 / 65.7 (e7) |