We present an unsupervised framework for learning semantically meaningful 3D keypoints from point cloud data using a latent diffusion model. Our method encodes input shapes into a structured latent space consisting of a set of 3D keypoints. These keypoints serve as a compact and interpretable representation that conditions an Elucidated Diffusion Model (EDM) to reconstruct the full shape. To ensure the extracted keypoints are both spatially meaningful and consistent across object instances, we introduce several geometric supervision strategies: a Chamfer loss to anchor keypoints near the input shape, and a deformation consistency loss to encourage robustness under geometric transformations.
A lot of this code is built upon the following repos:
Download ShapeNet from HuggingFace
To train a model on the airplane category with 10 unsupervised keypoints run:
python scripts/train_keypoint_diffuser.py
The CI will run several checks on the new code pushed to the repository. These checks can also be run locally without waiting for the CI by following the steps below:
- install
pre-commit, - Install the Git hooks by running
pre-commit install.
Once those two steps are done, the Git hooks will be run automatically at every new commit.
The Git hooks can also be run manually with pre-commit run --all-files, and if needed they can be skipped (not recommended) with git commit --no-verify.
Note: you may have to run pre-commit run --all-files manually a couple of times to make it pass when you commit, as each formatting tool will first format the code and fail the first time but should pass the second time.