Agent Detection

How to train the object detector on ROAD dataset

Code: https://github.com/WATonomous/mmdetection

Steps to train the agent detector

clone the code and checkout to road branch
create .env file and add COMPOSE_PROJECT_NAME=your_user_name in it.
Run docker-compose up mmdet to start the docker
Run docker exec -it your_user_name_mmdet_1 /bin/bash
Run pip install -v -e .
python tools/train.py configs/road/fpn_r50_config.py to train the FPN ResNet-50 detection model on ROAD dataset.

Road dataset images directory

/mnt/wato-drive/road/rgb-images/

Road annotations

/mnt/wato-drive/road/detections/coco_annotation_train1_full.json for the train1 split training set.
/mnt/wato-drive/road/detections/coco_annotation_train1_quarter.json for uniformly sampled 1/4 training set (currently using).
/mnt/wato-drive/road/detections/coco_annotation_val1.json for val1 split validation set.

Code to converted from ROAD annotation format to COCO format can be found at https://github.com/WATonomous/mmdetection/blob/road/tools/road/convert_road_gt_to_coco.py

Baseline

Detector	AP@0.5-0.95	AP@0.5
ResNet-50 FPN ImageNet finetune	30.0	56.2
ResNet-50 FPN COCO finetune	34.8	59

Public Results

YOLOv5 on validation set from paper ROAD: The ROad event Awareness Dataset for Autonomous Driving 57.9%

Dataset agent instance information

train1
{'ztotal': 416574, 'LarVeh': 5560, 'Cyc': 46460, 'Ped': 179591, 'Car': 92894, 'MedVeh': 15373, 'Bus': 10161, 'TL': 48329, 'Mobike': 1017, 'OthTL': 16837, 'EmVeh': 352}
val1
{'ztotal': 60103, 'Ped': 18949, 'Bus': 1834, 'Car': 11032, 'TL': 7690, 'MedVeh': 9895, 'Cyc': 6139, 'LarVeh': 633, 'Mobike': 2121, 'OthTL': 1810}
There is no 'EmVeh' class in val1

Improvement 1: Inactive Agent Detection

Steps to generate pseudo annotatons for inactive agent

Generate the predictions of ROAD training dataset with COCO pre-trained models and filter the predicted boxes with class labels and detection scores.
Compute the overlap of the predictons from step 1 and ROAD dataset annotations of the same class and perform Non-maximum supression.
Filter invalid predictions and generate new annotations of inactive agents.
Training new object detection models with the new dataset of 11 classes (10 previous classes of agents + inactive agents)

Code to generate the inactive agent annotations can be found at: https://github.com/WATonomous/mmdetection/blob/road/tools/road/generate_inactive_annotation.py

New annotations for the inactive agents can be found at /mnt/wato-drive/road/detections/coco_annotation_train1_quarter_inactive.json

Results

Detector	AP@0.5-0.95	AP@0.5
ResNet-50 FPN ImageNet finetune	30.9	61.3
ResNet-50 FPN COCO finetune	38.0	67.9

Improvement 2: Super-category Agent Class Merge

Merge certain classes of agents into a single super-category.

Steps for the class merging

Merge Car, MedVeh, LarVeh, Bus and EmVeh to a single class Vehicle
Merge TL, OthTL to a single class TL
10 classes are merged into 5 new classes

Code for super-category agent class merge can be found at https://github.com/WATonomous/mmdetection/blob/road/tools/road/super_category_merge.py

New annotations for the inactive agents and super-category can be found at /mnt/wato-drive/road/detections/coco_annotation_train1_quarter_inactive_merged.json

Improvement 3: Object detector based on optical flow

Detection	detection mAP	COCO finetune
RGB only (ImageNet-pretrain)	56.2	59.0
RGB + optical flow (x,y 2 channel) shallow fusion	52.6	53.8
RGB + optical flow (color 3 channel) shallow fusion	52.2	53.6
RGB + optical flow (magnitude 1 channel) shallow fusion	51.3	52.8
RGB + optical flow (color 3 channel) deep fusion addition	59.9	61.4
RGB + optical flow (color 3 channel) deep fusion concat	59.1	61.2

Combined with other improvements:

Detection	detection mAP
RGB only (COCO pretrain) + inactive + class merge	80.4
RGB + optical flow (color 3 channel) deep fusion addition + inactive + class merge	81.3

Larger backbone x101_64dx4:

Detection	detection mAP
RGB only (COCO pretrain) + inactive + class merge	80.5
RGB + optical flow (color 3 channel) deep fusion addition + inactive + class merge	83.1

1-channel: magnitude of x and y direction. sqrt(x^2 + y^2)
2-channel: magnitude in x and y direction.
3-channel: color wheel representation
shallow fusion: concat rgb and optical flow as input

Inactive agents has been improved from the visualization. But false positives increased since no ImageNet pre-trained model can be used on the new fusion input.

How to generate optical flow

Code:
(Optical Flow) https://github.com/liqq010/RAFT/blob/master/get_flow_for_road.py
(Normalized Optical Flow) https://github.com/liqq010/RAFT/blob/master/get_flow_for_road_norm.py

Normalization Steps:

First set a range for optical flow, -15 to 15 default. Clip the value out of bound.
Then normalize the value of optical flow to 0 to 255.

Where to find the generated optical flow:

3-channel color wheel: /mnt/wato-drive/road/optical_flow_color_wheel
2-channel normalized flow: /mnt/wato-drive/road/optical_flow_normalized

End-to-end Evaluation

How to do end-to-end evaluation of Acar-net on detection results

Prepare a json file with the format of following:
The first level of the json file contains the field db
The db field contains all frame level detections for all videos of validation:

To access detection for a video, use db['2014-06-25-16-45-34_stereo_centre_02'] where 2014-06-25-16-45-34_stereo_centre_02 is name of the video.
Each video detection comes with the follwoing fields: ['frames', 'numf', 'split_ids']
- split_ids contains the split id assigned this videos out of test, train1, val1, ...
- numf is the number of frames in the video
- frames contains frame-level detection results
  - for each frame, frame['1'] contains ['annotated', 'width', 'height', 'annos', 'input_image_id']
  - annotated is set to 1 always.
  - annos: contains detections of a frame with bounding boxes with unique keys.
  - annos['1'] has following keys ['box', 'agent_ids', 'action_ids']
    - box is normalized (0,1) bounding box coordinate with xmin, ymin, xmax, ymax
    - agent_ids is the class ids of detection classes
    - action_ids is set to 1 as a fake id which will not be used in evaluation.

Set the json file for the field annotation_path and set the evaluation to True in the config file to evaluate. An example json file is provided at /mnt/wato-drive/road/detections/new_val1_coco.json

Steps to generate detection results from detection model to acar end-to-end evaluation format

Run trained detection model and save results to xxx.pkl.

python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}]

Convert pkl results to an intermediate format file

python tools/road/convert_to_acar.py

Convert intermediate format to format acar-net needed

python tools/road/convert_to_gt.py

Need to define the threshold of score of bounding box. In general 0.7 is best.

End-to-end evaluation of Acar-net on detection results

Detection	detection mAP	action mAP frame-level	model path	detection file
Ground-truth	-	34.678
Center-Net (last year) (10 class)	62	23.546
Inactive agent detection (10 class)	67.9	23.837
Inactive agent detection + super-category class merged (6 class)	80.5	24.940	link	link
Inactive agent detection + super-category class merged (6 class) + deep fusion optical flow	83.1	26.407	link	link

More Results

On val2:

Detection	detection mAP(mAP@0.5-0.95 / mAP@0.5)	action mAP frame-level	model path	detection file
FPN_x101_64dx4_inactive_merged	11.5 / 28.4 (e1)
+deep fusion	14.4 / 33.4 (e1)

On val3:

Detection	detection mAP(mAP@0.5-0.95 / mAP@0.5)	action mAP frame-level	model path	detection file
FPN_x101_64dx4_inactive_merged	34.4 / 61.6 (e5)
+deep fusion	38.1 / 65.7 (e7)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Detection

How to train the object detector on ROAD dataset

Steps to train the agent detector

Road dataset images directory

Road annotations

Baseline

Dataset agent instance information

Improvement 1: Inactive Agent Detection

Steps to generate pseudo annotatons for inactive agent

Results

Improvement 2: Super-category Agent Class Merge

Steps for the class merging

Improvement 3: Object detector based on optical flow

How to generate optical flow

End-to-end Evaluation

How to do end-to-end evaluation of Acar-net on detection results

Steps to generate detection results from detection model to acar end-to-end evaluation format

End-to-end evaluation of Acar-net on detection results

More Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally