Skip to content

CoRL ToDos #30

@ghost

Description

TODO:

After rebuttal

  • License (MIT) and Max Planck Society

Alex

  • Read papers GP-SSM

    • learn a dynamics model, use the model for exploration strategy, maybe update more than one points
    • identify structure, use this structure to make some assumptions on the safety function
    • main point of the paper
  • general rewrite of GP approximation (3.2)

  • redo figures

  • 4-dimensional system

  • Redo 2-dimensional experiments

  • IAV affiliation

  • Matthias comments

  • Mention noise in discussion

  • Read new papers of reviewers

Steve

  • proof extension, based on Bellman update equations
  • acknowledgments
  • rewrite conclusions
    • Paper by Kirchner (ETH)
    • State-dependent uncertainty
    • replacing old samples ("closeness")
    • mixing in dynamics
  • General improvements based on reviews

prep for rebuttal

  • Alex get working example with the 4-D spaceship model: 02.08
    • Test convergence
    • Test existing 2d examples
  • Steve implement spaceship model with 2-D action space: 29.07
  • Alex get working example with the 5-D spaceship model: Not needed anymore?
  • Alex handle the last comment from Matthias

prep for Sept 7

  • Add commentary in conclusions: determinitic dynamics assumption is theoretically not required, though we have not investigated this and expect that practical complications of interest will arise.
  • Split into modules
    • models and viability Steve
    • GP learning Alex (also merge in submission branch)
  • label submission version of CoRL. Add in LaTeX files of the paper
  • Obtain better graphs
    • figures... we are not always converging to a safe subset
    • With multiple trajectories on the parameters, and get a nice convergence
    • Other types of graphs? In suppl. material? Comparison with Random Search, convergence/iterations, and failure rate. Alex
    • Comparison with cost-function not doing this
  • Clean up code
    • remove viability computations for warm-start in estimate_measure. Q_V, Q_M etc. should be calculated by the user outside, and then passed to the learning class. Classes implemented in measure should not depend on viability
    • data going into the sampler class... what does this contain? It shouldn't require any ground-truth data...
    • string together trajectories low priority
    • test function to run a bunch of trials with uniform random sampling {Steve, Alex}
    • 3D example look up
    • 5D example look up
    • Acrobot example? low
  • Rewrite
    • Point out notation Steve

    • Point out examples is in the suppl. code

    • Better colormaps Alex Use hatching for ground-truth, color for learned stuff

  • Appendix, with descriptions of additional examples
    • convergence proof in appendix See rebuttal

Deadline

  • Train GP hyperparameters with failures and infeasable points

  • Rewrite to be able to include different models

  • Arbitrary dynamics 2d

  • Q-Feas?

  • Arbitrary dynamics more-d

  • states undiscrete

  • plots

  • clean up code for submission

    • all examples of figures used in paper
    • bonus RL within the safe set
  • intro to GPs in 3 sentences

  • re-iterate on related work

  • do the extra models

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions