This repository is dedicated to the open-source release of TORM, a project focused on transparent object reconstruction and manipulation based on multi-view segmentation.
Our work aims to advance perception and reconstruction for transparent objects, which remain a challenging topic in computer vision and robotics.
Transparent objects are common in daily life and industry, necessitating that robots be able to perceive and manipulate them. The physical properties of reflection and refraction pose challenges for accurately reconstructing the 3D geometry of transparent objects. Conventional methods, which rely on simultaneous estimation of background ambient light and complex refraction fields, lack robustness in real-world scenes, thereby impeding robotic grasping performance. To address this issue, this paper proposes TORM, a novel framework for robust reconstruction and manipulation of multiple transparent objects. TORM focuses on semantic information from transparent objects and employs multi-view segmentation masks to constrain a self-supervised multi-object deep marching tetrahedra (DMTet-Multi) 3D fitting process. To mitigate the risk of the geometry representation getting stuck in suboptimal solutions during multi-transparent-object reconstruction, we design a novel loss function that prevents marching tetrahedra from crossing boundaries. By applying a connectivity determination strategy to the fitted mesh, transparent objects can be processed in parallel by a grasp perception network, predicting the end-effector configuration for grasp tasks. Real-world experiments demonstrate that TORM achieves an 88.8% grasping success rate in multi-transparent-object grasping tasks.
The TORM dataset can be downloaded from the following link:
Please make sure to cite our work if you use this dataset in your research.