I found that constructing masks using the original MADE paper https://arxiv.org/abs/1502.03509 (nodes are randomly assigned ids) works better than assigning an id deterministically as in
|
in_degrees = torch.arange(in_features) % in_flow_features |
(The reason for this is likely simple, that if the output size is not an exact multiple of the number of hidden units, some input features will be assigned more hidden nodes than others)
I found that constructing masks using the original MADE paper https://arxiv.org/abs/1502.03509 (nodes are randomly assigned ids) works better than assigning an id deterministically as in
pytorch-flows/flows.py
Line 19 in 5520ebe
(The reason for this is likely simple, that if the output size is not an exact multiple of the number of hidden units, some input features will be assigned more hidden nodes than others)