-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Vinyalis et al. introduce the mini-ImageNet dataset as a
- image downsampling (84x84)
- number of classes reduction (100)
- number of per-classes images reduction (600)
with respect to regular ImageNet. Some authors (I actually only know about Larochelle doing it) have adopted this as a standard of sorts for one-shot learning tasks. Should we move to this? One strong reason to do this is to enable better comparisons with one-shot learning literature: the CNN model should have not seen the classes you are performing one-shot learning on (see note 1); so we should probably train our models again if we want to compare to those papers. As previously discussed, this is quite useless as of now as we are not doing one-shot learning, but it is a direction I would be interested in.
Notes
1: this is interesting to me, as it feels that the similarity between the classes the base model has been trained on and the classes we are evaluating on could be a major driving force of the final one-shot accuracy performance. What about the following experiment:
- average the L2 distance between CNN features for a given class.
- the one-shot model is doing way better on the closest classes, no matter the one-shot model one is using. point being, the CNN features are still doing the most of the weight-lifting. or maybe some one-shot models contradict this assumption, which would be kind of interesting!
A general note: one-shot models have not the same level of "software maturity" as other deep models do. Probably hard to find them/have to move to LUA to compare them (the one-shot LSTM stuff by Larochelle is in regular torch as an example) .