Hi,
Thanks for the great work, NUDGE.
There is one thing making me confused. I see from the code that in the GetOut environment, the beam searched top 1 clauses are the same as those provided by humans (I guess it is human_assisted). This means the two models have the same rules (C), but their performance in your paper Figure 3 GetOut seems different.
Why is this? Please correct me if I have the wrong understanding.
By the way, if we only search the top-1 ref., does that mean the weight matrix has only little influence? i.e., the performance would be similar if we provide a fixed identity matrix instead of training a W. I may have a wrong understanding.
Thank you very much.