Some aspects of model hyperparameters can be encoded as parameters, and thus can be explored during hypernet training. For example, the number of neural network hidden channels can be encoded using a channel/dropout mask that is a parameter of $f_{\theta}$. While, users will still need to use a validation set to select the optimal performing functions after training the IEN, they do not have to run multiple training sessions with unique hyper-parameters. This can improve ease of use, and may be more efficient. (we should evaluate this).
Other approaches we might consider:
- skip connections + attention mechanisms could choose the number of layers
We might also consider this an entirely stochastic component that enriches the variety of hypernetworks, for instance, along with sampling z (stochastic seed) we can randomly sample bernoulli variables that can be used to select different hyper parameters. For instance, selection of batch normalization - this is an attractive avenue as we could encode almost any hyper parameter, however, we would probably need to condition z (stochastic seed) on this hyper-parameter selection. For instance, if we use batchnorm, then we should only sample seeds that generate batchnorm relevant weights (theta).
Some aspects of model hyperparameters can be encoded as parameters, and thus can be explored during hypernet training. For example, the number of neural network hidden channels can be encoded using a channel/dropout mask that is a parameter of$f_{\theta}$ . While, users will still need to use a validation set to select the optimal performing functions after training the IEN, they do not have to run multiple training sessions with unique hyper-parameters. This can improve ease of use, and may be more efficient. (we should evaluate this).
Other approaches we might consider:
We might also consider this an entirely stochastic component that enriches the variety of hypernetworks, for instance, along with sampling z (stochastic seed) we can randomly sample bernoulli variables that can be used to select different hyper parameters. For instance, selection of batch normalization - this is an attractive avenue as we could encode almost any hyper parameter, however, we would probably need to condition z (stochastic seed) on this hyper-parameter selection. For instance, if we use batchnorm, then we should only sample seeds that generate batchnorm relevant weights (theta).