You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 1, 2024. It is now read-only.
I am not clear about what we are using as input for value network in train and test phases.
In train phase we are using both public beliefs as inputs. For example in poker we use ranges for each agent. This ranges are vectors with mostly non-zero numbers in most of cases.
But in test phase we know our exact infostate. And for example in poker our range contains all zeros except one hand with 1. At the same time our opponent's range is still a vector with mostly non-zero numbers.
And my question is:
Is it ok to train with input which filled with non-zeros, but test with input with half of zeros (our range)?
Or maybe we should sample hole cards for each train iteration and therefore use as input hero range as all zeros except one and opponent range as full distribution between all possible hands?