-
Notifications
You must be signed in to change notification settings - Fork 55
Open
Description
I've investigated the efficiency of the current repo, especially with the tutorials/examples/train_hypergrid_simple.py example and already pushed some changes in #376.
There are still some inefficiencies compared to the lightest implementation of HyperGrid (w/ TB & DB) as follows:
- When we use learnable PB, we call PF and PB independently, i.e., we require two forward passes of NN for each step. While we could share the parameters between PF and PB, we cannot call them in one forward pass in the current implementation.
DiscreteEnvenforces its stateDiscreteStatesto possess forward & backward masks and update them duringEnv._step. However, this can be inefficient because having those masks makesStates.__getitem__(e.g., slicing) slower since it requires 3 tensor slicing (states.tensor, and two masks), but in many cases, we don't need the sliced masks.torch.distributions.Categoricalis inherently slower thanprobs.multinomial(1). Its initialization takes some time, and sampling with it also seems slower than the multinomial.
1 needs to be handled with the related issue #315, since we may need to store backward probabilities in the Trajectories object (now it can only store the forward probabilities).
2 can be resolved by making those masks optional and allowing the use of @property for masking instead, just like we do for GraphEnvs.
3 can be resolved by defining a custom distribution without all the sanity checks in Categorical, but it would be better to keep them for reliability.
Metadata
Metadata
Assignees
Labels
No labels