Utility of hypernets for integrated hyper-parameter selection 


Some aspects of model hyperparameters can be encoded as parameters, and thus can be explored during hypernet training. For example, the number of neural network hidden channels can be encoded using a channel/dropout mask that is a parameter of $f_{\theta}$. While, users will still need to use a validation set to select the optimal performing functions after training the IEN, they do not have to run multiple training sessions with unique hyper-parameters. This can improve ease of use, and may be more efficient. (we should evaluate this). 

Other approaches we might consider: 
- skip connections + attention mechanisms could choose the number of layers 

We might also consider this an entirely stochastic component that enriches the variety of hypernetworks, for instance, along with sampling z (stochastic seed) we can randomly sample bernoulli variables that can be used to select different hyper parameters. For instance, selection of batch normalization - this is an attractive avenue as we could encode almost any hyper parameter, however, we would probably need to condition z (stochastic seed) on this hyper-parameter selection. For instance, if we use batchnorm, then we should only sample seeds that generate batchnorm relevant weights (theta). 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utility of hypernets for integrated hyper-parameter selection #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Utility of hypernets for integrated hyper-parameter selection #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions