Thank you for sharing your impressive research results.
I have a question regarding the training data for atomic identifiers in the Flickr dataset.
While reviewing the training command arguments, I noticed that t2id_shards is specified, but i2id_shards is not.
Does this mean that during training with atomic identifiers, data containing both images and textual instructions as input (learning to memorize) was not utilized?
Thank you for sharing your impressive research results.
I have a question regarding the training data for atomic identifiers in the Flickr dataset.
While reviewing the training command arguments, I noticed that t2id_shards is specified, but i2id_shards is not.
Does this mean that during training with atomic identifiers, data containing both images and textual instructions as input (learning to memorize) was not utilized?