Reinforcement Learning-Augmented Model Predictive Control at scale for legged and hybrid robots. Part of the IBRIDO project.
-
Hierarchical RL-MPC coupling – The RL agent chooses contact schedules and twist commands for the underlying MPC controllers. A new flight phase is injected, for each limb, when the corresponding actions instantaneously exceed a given thresholds.
-
Sample-efficient learning at scale – AugMPC achieves data-efficient training through high-throughput experience generation, enabled by aggressive MPC parallelization and fully vectorized simulation. On a workstation equipped with an AMD Ryzen Threadripper 7970, 128 GiB RAM, and an NVIDIA RTX 4090, the system sustains 50+× real-time factor while running 800 parallel environments / full rigid body MPC instances at 20 Hz with a ~1 s MPC horizon, even with high dof robots like Centauro (nv=43). Training with Soft Actor–Critic (SAC) and MPCs in the loop typically converges in 1–10 × 10⁶ environment steps (≈ 6 h wall-clock time, corresponding to 9–29 simulated days). This contrasts with > 100 × 10⁶ steps commonly required by blind end-to-end RL locomotion policies.
-
Domain adaptability – thanks to MPC's robustness, successful sim-to-sim and sim-to-real zero-shot transfer without any domain randomization (no contact properties, inertial, timing randomizations).
-
Robot adaptability – validated on robots with different morphologies and weight distributions (30-120 Kg), with standard legged and hybrid locomotion tasks.
- Non-Gaited contact scheduling – our architecture is able to generate completely acyclic gaits and timing adaptations:
Shared-memory first design: AugMPC relies on a shared memory layer, built on top of EigenIPC for deterministic, real-time-safe communication between simulators, controllers, and learning processes.
AugMPC’s is essentially made of three main components:
- World interface – Implements
AugMPCWorldInterfaceBase. It connects to Isaac Sim, xbot2, or hardware, publishes robot states, and triggers MPCHive controllers via shared memory. Optional remote stepping lets the training loop decide when the simulator should advance. - MPC cluster – Uses MPCHive's
ControlClusterServer/Clientto spawn multiple receding-horizon controllers (seeaug_mpc.controllers). Each controller reads robot states, solves its MPC problem and writes predictions and commands back to shared memory. - Training environment + RL algorithm – An
AugMPCTrainingEnvBasederivative defines the MDP at hand (observations, actions, rewards, terminations, trucations), which is then used by the training executable (SAC is the default, PPO supported).
Specific implentations of world interfaces and training environments are available at AugMPCEnvs.
aug_mpc/
├── training_envs/ # Base classes and wrappers for AugMPCEnvs environments
├── world_interfaces/ # Base world interface that AugMPCEnvs extends
├── controllers/ # LRHC/MPCHive clients and Horizon-based MPC implementations
├── agents/ # Neural network policies (PPO, SAC, dummy)
├── training_algs/ # PPO/SAC trainers, rollout logic, persistence
├── scripts/ # Launchers for clusters, world interfaces, and training loops
└── utils/ # Shared-memory helpers, visualization bridges, teleop, math/utils
The preferred way to install MPCHive is through ibrido-containers, which ships with all necessary dependencies.
- New controller –
- New world interface –
- New training environment –
- New agent/algorithm –





