Skip to content

Agent Detection

liqq010 edited this page Aug 29, 2022 · 70 revisions

How to train the object detector on ROAD dataset

Code: https://github.com/WATonomous/mmdetection

Steps to train the agent detector

  1. clone the code and checkout to road branch
  2. create .env file and add COMPOSE_PROJECT_NAME=your_user_name in it.
  3. Run docker-compose up mmdet to start the docker
  4. Run docker exec -it your_user_name_mmdet_1 /bin/bash
  5. Run pip install -v -e .
  6. python tools/train.py configs/road/fpn_r50_config.py to train the FPN ResNet-50 detection model on ROAD dataset.

Road dataset images directory

/mnt/wato-drive/road/rgb-images/

Road annotations

  • /mnt/wato-drive/road/detections/coco_annotation_train1_full.json for the train1 split training set.
  • /mnt/wato-drive/road/detections/coco_annotation_train1_quarter.json for uniformly sampled 1/4 training set (currently using).
  • /mnt/wato-drive/road/detections/coco_annotation_val1.json for val1 split validation set.

Code to converted from ROAD annotation format to COCO format can be found at https://github.com/WATonomous/mmdetection/blob/road/tools/road/convert_road_gt_to_coco.py

Baseline

Detector AP@0.5-0.95 AP@0.5
ResNet-50 FPN ImageNet finetune 30.0 56.2
ResNet-50 FPN COCO finetune 34.8 59

Public Results

  • YOLOv5 on validation set from paper ROAD: The ROad event Awareness Dataset for Autonomous Driving 57.9%

Dataset agent instance information

  • train1
    {'ztotal': 416574, 'LarVeh': 5560, 'Cyc': 46460, 'Ped': 179591, 'Car': 92894, 'MedVeh': 15373, 'Bus': 10161, 'TL': 48329, 'Mobike': 1017, 'OthTL': 16837, 'EmVeh': 352}
  • val1
    {'ztotal': 60103, 'Ped': 18949, 'Bus': 1834, 'Car': 11032, 'TL': 7690, 'MedVeh': 9895, 'Cyc': 6139, 'LarVeh': 633, 'Mobike': 2121, 'OthTL': 1810}
    There is no 'EmVeh' class in val1

Improvement 1: Inactive Agent Detection

Steps to generate pseudo annotatons for inactive agent

  1. Generate the predictions of ROAD training dataset with COCO pre-trained models and filter the predicted boxes with class labels and detection scores.
  2. Compute the overlap of the predictons from step 1 and ROAD dataset annotations of the same class and perform Non-maximum supression.
  3. Filter invalid predictions and generate new annotations of inactive agents.
  4. Training new object detection models with the new dataset of 11 classes (10 previous classes of agents + inactive agents)

Code to generate the inactive agent annotations can be found at: https://github.com/WATonomous/mmdetection/blob/road/tools/road/generate_inactive_annotation.py

New annotations for the inactive agents can be found at /mnt/wato-drive/road/detections/coco_annotation_train1_quarter_inactive.json

Results

Detector AP@0.5-0.95 AP@0.5
ResNet-50 FPN ImageNet finetune 30.9 61.3
ResNet-50 FPN COCO finetune 38.0 67.9

Improvement 2: Super-category Agent Class Merge

Merge certain classes of agents into a single super-category.

Steps for the class merging

  1. Merge Car, MedVeh, LarVeh, Bus and EmVeh to a single class Vehicle
  2. Merge TL, OthTL to a single class TL
  3. 10 classes are merged into 5 new classes

Code for super-category agent class merge can be found at https://github.com/WATonomous/mmdetection/blob/road/tools/road/super_category_merge.py

New annotations for the inactive agents and super-category can be found at /mnt/wato-drive/road/detections/coco_annotation_train1_quarter_inactive_merged.json

Improvement 3: Object detector based on optical flow

Detection detection mAP COCO finetune
RGB only (ImageNet-pretrain) 56.2 59.0
RGB + optical flow (x,y 2 channel) shallow fusion 52.6 53.8
RGB + optical flow (color 3 channel) shallow fusion 52.2 53.6
RGB + optical flow (magnitude 1 channel) shallow fusion 51.3 52.8
RGB + optical flow (color 3 channel) deep fusion addition 59.9 61.4
RGB + optical flow (color 3 channel) deep fusion concat 59.1 61.2

Combined with other improvements:

Detection detection mAP
RGB only (COCO pretrain) + inactive + class merge 80.4
RGB + optical flow (color 3 channel) deep fusion addition + inactive + class merge 81.3

Larger backbone x101_64dx4:

Detection detection mAP
RGB only (COCO pretrain) + inactive + class merge 80.5
RGB + optical flow (color 3 channel) deep fusion addition + inactive + class merge 83.1

1-channel: magnitude of x and y direction. sqrt(x^2 + y^2)
2-channel: magnitude in x and y direction.
3-channel: color wheel representation
shallow fusion: concat rgb and optical flow as input

Inactive agents has been improved from the visualization. But false positives increased since no ImageNet pre-trained model can be used on the new fusion input.

How to generate optical flow

Code:
(Optical Flow) https://github.com/liqq010/RAFT/blob/master/get_flow_for_road.py
(Normalized Optical Flow) https://github.com/liqq010/RAFT/blob/master/get_flow_for_road_norm.py

Normalization Steps:

  • First set a range for optical flow, -15 to 15 default. Clip the value out of bound.
  • Then normalize the value of optical flow to 0 to 255.

Where to find the generated optical flow:

  • 3-channel color wheel: /mnt/wato-drive/road/optical_flow_color_wheel
  • 2-channel normalized flow: /mnt/wato-drive/road/optical_flow_normalized

End-to-end Evaluation

How to do end-to-end evaluation of Acar-net on detection results

Prepare a json file with the format of following:
The first level of the json file contains the field db
The db field contains all frame level detections for all videos of validation:

  • To access detection for a video, use db['2014-06-25-16-45-34_stereo_centre_02'] where 2014-06-25-16-45-34_stereo_centre_02 is name of the video.
  • Each video detection comes with the follwoing fields: ['frames', 'numf', 'split_ids']
    • split_ids contains the split id assigned this videos out of test, train1, val1, ...
    • numf is the number of frames in the video
    • frames contains frame-level detection results
      • for each frame, frame['1'] contains ['annotated', 'width', 'height', 'annos', 'input_image_id']
      • annotated is set to 1 always.
      • annos: contains detections of a frame with bounding boxes with unique keys.
      • annos['1'] has following keys ['box', 'agent_ids', 'action_ids']
        • box is normalized (0,1) bounding box coordinate with xmin, ymin, xmax, ymax
        • agent_ids is the class ids of detection classes
        • action_ids is set to 1 as a fake id which will not be used in evaluation.

Set the json file for the field annotation_path and set the evaluation to True in the config file to evaluate. An example json file is provided at /mnt/wato-drive/road/detections/new_val1_coco.json

Steps to generate detection results from detection model to acar end-to-end evaluation format

  1. Run trained detection model and save results to xxx.pkl.
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] 
  1. Convert pkl results to an intermediate format file
python tools/road/convert_to_acar.py  
  1. Convert intermediate format to format acar-net needed
python tools/road/convert_to_gt.py

Need to define the threshold of score of bounding box. In general 0.7 is best.

End-to-end evaluation of Acar-net on detection results

Detection detection mAP action mAP frame-level model path detection file
Ground-truth - 34.678
Center-Net (last year) (10 class) 62 23.546
Inactive agent detection (10 class) 67.9 23.837
Inactive agent detection + super-category class merged (6 class) 80.5 24.940 link link
Inactive agent detection + super-category class merged (6 class) + deep fusion optical flow 83.1 26.407 link link

More Results

On val2:

Detection detection mAP(mAP@0.5-0.95 / mAP@0.5) action mAP frame-level model path detection file
FPN_x101_64dx4_inactive_merged 11.5 / 28.4 (e1)
+deep fusion 14.4 / 33.4 (e1)

On val3:

Detection detection mAP(mAP@0.5-0.95 / mAP@0.5) action mAP frame-level model path detection file
FPN_x101_64dx4_inactive_merged 34.4 / 61.6 (e5)
+deep fusion 38.1 / 65.7 (e7)

Clone this wiki locally