Though dev splits of the MASSIVE dataset may not be used for model training, but it can be used for hyperparamer tuning. Tuning hyperparamters with effective hyperparamer searching algorithms is essentially training with dev splits in some extent, especially for those with many gpus. So, for the full dataset competition, with more gpus, contestants can use more training data(the dev split). For the zero-shot competition, with more gpus, contestants can use non-english labelled data(the dev split) indirectly. Maybe it is unfair for individual contestants(those with no enough gpu resources) compared with those who stands for a lab or company(usually they have more gpu resources). Maybe merge the train and dev splits as train split and let contestants to split the train-set as train and dev by themselves for hyper-parameter tuning is better and fair for everyone.