Skip to content

EfficientDet adapted for pose estimation using cross-modal knowledge distillation

License

Notifications You must be signed in to change notification settings

JasonNero/efficientpose-softsensor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

497 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EfficientPose SoftSensor

Extending Ross Wightmans EfficientDet according to EfficientPose with a Translation and Rotation Head in order to predict 6D Object Poses.

This is the code for the master thesis "KI-basierte Posenbestimmung von Objekten mit Cross-Modal Knowledge Distillation und Synthetischen Daten" by Jason Schühlein.

For the original README, see README_EfficientDet.

Table of Contents

Overview / Folder Structure

  • ./ Root folder contains the main scripts.

    • ./train.py is the main training script.
    • ./inference.py is the main inference script.
    • ./evaluate.py is the main evaluation script.
    • ./inference_cam.py is the inference script adapted for GradCAM usage.
    • ./raytune.py is used to run a batch of experiments.
    • ./batch_evaluate.py is used to evaluate a batch of experiments.
    • ./vis_utils.py contains utilities for visualization.
    • ./pose_metrics.py contains the pose metric functions.
    • ./train_znet.py is the main training script for the Z-Net experiment.
    • ./inference_znet.py is the main inference and script for the Z-Net experiment.
  • ./dataset/ contains scripts to calibrate, record and postprocess the dataset.

    • ./dataset/calibration/ contains the calibration files for camera and sensor.
    • ./dataset/model/ contains the Fleckenzwerg and calibration model.
    • ./dataset/record.py is the main recording script.
    • ./dataset/camera_calibrate.py is used to calibrate the camera with a checkerbord pattern.
    • ./dataset/sensor_calibrate.py is used to re-calibrate and register the camera together with the sensor.
    • ./dataset/postprocess_real.py will deal with pointcloud and image postprocessing.
    • ./dataset/postprocess_synth.py will deal with the conversion of the synthetic dataset.
    • ./dataset/cleanup_dataset.py can be used to walk through the dataset and mark bad data pairs.
    • ./dataset/transformations.py contains utilities for the transformations between different coordinate systems.
    • ./dataset/transfer_annotations.py will transfer the orthographic annotations of depthmaps to the perspective images.
    • ./dataset/blender_extract_cli.py is the entry point for extracting annotations from a blender scene.
  • ./output/ contains the output of the training.

  • ./ray_configs/ contains parameter space definitions of the experiments.

  • ./ray_results/ contains the results of those experiments.

  • ./notebooks/ contains notebooks for data exploration and plotting, some notable ones are:

    • ./notebooks/picker.ipynb allows to pick 2D and 3D points for calibration and registration.
    • ./notebooks/split_image_dataset.ipynb splits the dataset into train, val and test.
    • ./notebooks/batch_plot.ipynb creates most of the plots.
    • ./notebooks/evaluate_datasets.py plots the distribution of the datasets.
  • ./effdet/ contains the modified EfficientDet code, notable mentions are:

    • ./effdet/efficientdet.py contains the main model class, extended with the new head networks.
    • ./effdet/rotation6d.py contains the 6D rotation representation adapted for use with anchor-based detection.
    • ./effdet/loss.py contains the extended loss functions and the loss experiments.
    • ./effdet/znet.py contains the Z-Net experiment architecture.
    • ./effdet/anchors.py contains the anchor generation code.
      • and the z normalization which might differ from EP-D to EP-I.
    • ./effdet/object_detection/target_assigner.py assigns targets to anchors.
    • ./effdet/data/dataset.py contains the new datasets.
    • ./effdet/data/transforms.py contains all augmentations.
    • ./effdet/data/parsers/parser_pose.py contains the custom labelme parser.

Dataset Acquisition

To acquire a dataset for training, the following steps need to be performed:

  • TODO

Training

The most basic training command is:

python train.py [DIR] --dataset [DATASET] --num-classes 1

To switch between EfficientPose-Depth and EfficientPose-Image, simply change the --dataset argument to depthpose or imagepose respectively.

Tensorboard tracking is enabled by default.

All augmentations, loss variants, loss weights and alternative head networks can be toggled with a command line argument.

The most notable ones are:

CLI Argument Description
--combined-t-head Use the combined translation head.
--iterative-pose-heads Use the refinement modules.
--axis-angle Use Axis-Angle instead of 6D-Rotation
--mean-dist-z Use weighted loss for z values further away from mean
--mean-dist-rot Use weighted loss for rot values further away from mean
--mask-aug Use mask and threshold augmentations.
--color-aug Use color augmentations.
--rot-aug Use rotation augmentation (part of 6D-Aug).
--z-aug Use Z augmentation (part of 6D-Aug).
--z-aug-min [float] Minimum Z augmentation value.
--z-aug-max [float] Maximum Z augmentation value.
--box-loss-weight [float] Weight of the bounding box loss.
--z-loss-weight [float] Weight of the Z loss.
--xy-loss-weight [float] Weight of the XY loss.
--rot-loss-weight [float] Weight of the rotation loss.

Inference

To infer on a dataset/split run:

python inference.py [DIR] \
  --dataset [DATASET] \
  --split [SPLIT] \
  --checkpoint [PATH] \
  --mean [float] [float] [float] \  # Mean of the dataset
  --visualize                       # Visualize the inference

When infering the EP-D on synthetic data, no mean needs to be specified. When infering on real data, the mean of the dataset needs to be specified (here 0.46 0.46 0.46).

Inference on EP-I does not need a mean value.

Evaluation

To evaluate a model on a dataset/split run:

python evaluate.py [DIR] \
  --dataset [DATASET] \
  --split [SPLIT] \
  --checkpoint [PATH] \
  --num-classes 1 \
  --min-score 0.5 \     # Minimum score for a detection
  --visualize \         # Live Visualization
  --save-vis2d \        # Save 2D visualizations
  --save-vis3d \        # Render 3D visualizations

The output will be saved next to the checkpoint in the form of two results files:

  • [split]_[min-score]_metrics.json contains the final Precision/Recall/ADD/... Metrics.
  • [split]_[min-score]_details.json contains per-detection metrics.

For further evaluation and plotting, take a look at the batch_plot.ipynb or batch_plot_epd.ipynb notebook.

About

EfficientDet adapted for pose estimation using cross-modal knowledge distillation

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages