Very nice work!
Is it possible to share the exact hyperparameters used for the ablations on the Rainbow environment? I am trying to recreate these results using the smaller, default Transformer size (3 layers, 128 dim, 8 heads). However, I find that most problems are solved at exactly 64 nodes expanded. (Interestingly there also seems to be a jump in Figure 4 from the paper at 64 nodes expanded.)
Here are what my current results look like:

Thanks so much!