Skip to content
/ VCCR Public

Visual Navigation Urgent Circumstances via Counterfactual Reasoning in Carla Simulator

License

Notifications You must be signed in to change notification settings

brunoleej/VCCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visual Navigation under Urgent Circumstances via Counterfactual Reasoning in CARLA Simulator (VCCR)

This project is submitted on Robot Artificial Intelligence Class.

Baseline model

  • Model-based Offline Policy Optimization (MOPO)
  • Causal Inference (CI)

Running Environment

We are currently working on below setup.

  • Ubuntu: Ubuntu 20.04
  • Python: 3.8.18
  • Carla Simulator: latest

Installation

  1. Install the Carla Simulator in CARLA official installation.
  2. Clone VCCR repository.
git clone --recursive https://github.com/brunoleej/VCCR.git
  1. Create a Anaconda environment and run requirement.txt file.
cd VCCR
conda env create -f venv_setup.yml
conda activate vccr
pip install -e viskit
pip install -e .

Usage

  1. Baseline Model-based offline Policy Optimization (MOPO)

    • Currently only running locally is supported.

    • To run our baseline algorithm, the executable files at vccr/MOPO/train_mopo_agent.ipynb, and the expert dataset at vccr/MOPO/expert_dataset.pickle.

  2. Train Autonomous Driving (AD) agent in CARLA simulator.

  3. Train Autonomous Navigation

    • We compared the autonomous navigation ability of VCCR against three model-free reinforcement learning (MFRL) algorithms: Deep Deterministic Policy Gradient (DDPG), Twin-delayed Deep Deterministic Policy Gradient (TD3), and Soft-Actor-Critic (SAC).

    • To train three MFRL algorithms, you can run environment/sb_train.py

      python environment/sb_train.py
      

New environments

To run on a different environment, you can modify the provided template.

Logging

For now Wandb logging is automatically running. If you don't mind you can comment processing the wandb logging line.

References

@article{MOPO,
   title = {{MOPO:} Model-based Offline Policy Optimization},
   author = {Tianhe Yu and Garrett Thomas and Lantao Yu and Stefano Ermon and James Zou and Sergey Levine and Chelsea Finn and Tengyu Ma},
   journal = {CoRR},
   year = {2020},
}

@article{causal-rl-survey,
  title={Causal Reinforcement Learning: A Survey},
  author={Deng, Zhihong and Jiang, Jing and Long, Guodong and Zhang, Chengqi},
  journal={arXiv preprint arXiv:2307.01452},
  year={2023}
}

Acknowledgements

The underlying Deep Deterministic Policy Gradient (DDPG), Soft-Actor-Critic (SAC), Twin-Delayed Deep Deterministic Policy Gradient (TD3) implementation in Experiment section comes from stable-baselines3 codebase.

About

Visual Navigation Urgent Circumstances via Counterfactual Reasoning in Carla Simulator

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published