-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Dear Authors,
Hello, I have been reading View2Cap (https://arxiv.org/abs/2503.23024) paper with great interest, and I am currently trying to reproduce some of the experiments. I would greatly appreciate your clarification and assistance on a few points:
Model Checkpoints
Could you kindly share the checkpoint for the LEO-based LLM model that was pre-trained with View2Cap, and the situation grounding module? This would be very helpful for reproducing your results.
Dataset Issues
When using the dataset provided on HuggingFace (https://huggingface.co/datasets/ZzZZCHS/processed_scannet), I noticed some missing files referenced in the viewqa JSON files (earned in https://cuhko365-my.sharepoint.com/personal/221019046_link_cuhk_edu_cn/_layouts/15/onedrive.aspx?id=%2Fpersonal%2F221019046%5Flink%5Fcuhk%5Fedu%5Fcn%2FDocuments%2FShare%2FView2Cap%2Fannotations&ga=1). For example, the dataset includes color/depth/pose images at intervals of 20 (i.e., 0, 20, 40, …), but the JSON files sometimes reference indices like 10, 30, 50 (i.e., starting from 10 and increasing by 20). These files are not present after decompressing the dataset. Could you advise on how to handle these missing references?
Single vs. Multi Data References
In the viewqa benchmark, the item_id field sometimes refers to paths in both the single data (color) and the multi data (posed_images). Are these intended to be the same images? If so, given the issue mentioned above (point 2), I am unable to access some of the required multi-view images as well. Would it be possible to obtain the original source of these missing images?
Thank you very much for your time and for making your work available to the community. I would be very grateful for any guidance or resources you could provide.
Best regards,
Daehyeon Choi