I tried to shuffle object tokens with fixed order for some tasks, and the result is interesting
|
bboxes = np.asarray(bboxes) |
I added these lines
cropped_imgs = [cropped_imgs[i] for i in [0,2,1]]
bboxes = [bboxes[i] for i in [0,2,1]]
The robot tries to pick up the distractor instead of dragged object.
I didn't make any changes to the prompt.