This is the public implementation for the contrastive action model from "Latent Action Diffusion for Cross-Embodiment Manipulation". This guide is a brief walkthrough on how to use the model.
Your data should be preprocessed in a NumPy (.npy) file, containing a dictionary with keys that are the modality names and values that are arrays of shape (N, d_action), where each row entry across modalities should be semantically aligned.
Two-step training (encoders then decoders):
- Run
python two_step_train.py --config-name two_step_config.yamlto first train encoders and then decoders using the provided Hydra config.
Joint training (train encoders and decoders together):
- Run
python joint_train.py --config-name joint_config.yamlto train encoders and decoders jointly. The joint training script usesconfig/joint_config.yaml(or another Hydra config of your choice) and logs training metrics to wandb when enabled.
Notes:
- Both training flows use the repository's NumPy-based dataset loader. Your dataset must be a NumPy
.npyfile containing a dictionary that maps modality names to arrays of shape(N, d_action). Point the config'sdata.source_fileto that.npyfile. - Config files live in the
config/directory. Adjust optimizer, batch size, temperature schedule, and save paths via those configs.
You can install the package using pip install -e . and easily use it in other packages.
For custom modalities, the encoder and decoder classes can be easily modified.
For logging, wandb is used, though it can be easily changed to other providers.