Hi,
Thanks for the impressive work! I’ve been exploring the repository and successfully ran the Quick Interactive M1 Demo from the README. The model correctly identified the apple's position, and my visualization confirmed the accuracy.
Output: ['{"label": "apple", "bbox_2d": [304, 226, 345, 267]}']

Furthermore, when I fed it an image from the LIBERO evaluation environment, it could still correctly identify the position of the plate.
Output: ['{"label": "plate", "bbox_2d": [26, 147, 68, 179]}']

However, these are all based on the Pretrain-RT-1-Bridge checkpoint. When using the libero-related checkpoints, the model's output becomes "garbled" or nonsensical (unlike the clean results from the M1 demo).
Output: ['极为这般 ENCჭpunkt\n极为勍:\n桀chantment实效uri\n极力栽培差别澎湃\néra:\n粜acity!\n粜acity!\nerrer!\nerrer?\npragma:\nerrer?\n搛\n恪光阴;\n债权?\n\t \nrang!\n恪光阴;\n恪光阴!\n機能: []\nável!\n.pathname!\n機能:!\n恪光阴!\n.pathname!\n機能:!\n恪光阴!\n.pathname!\n恪光阴!\n機能:!\n恪光阴!\n.pathname!\n恪光阴!\n機能:!\n恪光阴!\n.pathname!\n恪光阴!\n機能:!\n恪光阴!\n.pathname!\n恪光阴!\n機能:!\n恪光阴!\n.pathname!\n恪光阴!\n']
My questions are:
- Does the fine-tuned LIBERO model sacrifice its general grounding/VQA capability to achieve higher task success rates?
- I noticed the
Pretrain-RT-1-Bridge ckpt uses dinov2_vits14 while the libero-related ckpts use dinov2_vitl14. Why is that? What does "Pretrain" signify here? Is it a general-purpose robotic foundation model before tuning into libero ckpt? What is the relationship between these versions?
- Since the
Pretrain-RT-1-Bridge ckpt already demonstrates strong zero-shot grounding capabilities, I am curious about its potential for direct libero evaluation. Is that possible?
Please excuse any imprecise terminology, as I am still a newcomer to this field and am learning the ropes.
Hi,
Thanks for the impressive work! I’ve been exploring the repository and successfully ran the Quick Interactive M1 Demo from the README. The model correctly identified the apple's position, and my visualization confirmed the accuracy.
Output: ['{"label": "apple", "bbox_2d": [304, 226, 345, 267]}']

Furthermore, when I fed it an image from the LIBERO evaluation environment, it could still correctly identify the position of the plate.
Output: ['{"label": "plate", "bbox_2d": [26, 147, 68, 179]}']

However, these are all based on the
Pretrain-RT-1-Bridgecheckpoint. When using thelibero-relatedcheckpoints, the model's output becomes "garbled" or nonsensical (unlike the clean results from the M1 demo).Output: ['极为这般 ENCჭpunkt\n极为勍:\n桀chantment实效uri\n极力栽培差别澎湃\néra:\n粜acity!\n粜acity!\nerrer!\nerrer?\npragma:\nerrer?\n搛\n恪光阴;\n债权?\n\t \nrang!\n恪光阴;\n恪光阴!\n機能: []\nável!\n.pathname!\n機能:!\n恪光阴!\n.pathname!\n機能:!\n恪光阴!\n.pathname!\n恪光阴!\n機能:!\n恪光阴!\n.pathname!\n恪光阴!\n機能:!\n恪光阴!\n.pathname!\n恪光阴!\n機能:!\n恪光阴!\n.pathname!\n恪光阴!\n機能:!\n恪光阴!\n.pathname!\n恪光阴!\n']
My questions are:
Pretrain-RT-1-Bridgeckpt usesdinov2_vits14while thelibero-relatedckpts usedinov2_vitl14. Why is that? What does "Pretrain" signify here? Is it a general-purpose robotic foundation model before tuning into libero ckpt? What is the relationship between these versions?Pretrain-RT-1-Bridgeckpt already demonstrates strong zero-shot grounding capabilities, I am curious about its potential for direct libero evaluation. Is that possible?Please excuse any imprecise terminology, as I am still a newcomer to this field and am learning the ropes.