You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 1, 2024. It is now read-only.
I am not clear about how chance node works in subgame solving at the end of a betting round.
Any clarifications would be very helpful. Thank you.
The paper shows that ReBeL only solves until the end of a betting round. In that case, the leaf nodes are either terminal nodes or chance nodes. To get the value of a chance node, a value network is used. But what should be the active player of a chance node? The paper uses binary flag to encode active player which doesn't seem to include the chance player.
The paper shows that CFR only solves to the end of a betting round, but subgame seems to extend to the start of the next betting round. In that case, there is a gap between the end of a betting round and the start of the next round when the board cards are dealt. How does ReBeL model it? Does it model the chance node and its children to be a separate tree itself. The value of the chance node is the average value of the values of all its children whose values are queried from the value network. Is it true? (referred to: the paragraph in the paper: "Our agent always solves to the end of the current betting round regardless...")
Hi Anton,
I am not clear about how chance node works in subgame solving at the end of a betting round.
Any clarifications would be very helpful. Thank you.
The paper shows that ReBeL only solves until the end of a betting round. In that case, the leaf nodes are either terminal nodes or chance nodes. To get the value of a chance node, a value network is used. But what should be the active player of a chance node? The paper uses binary flag to encode active player which doesn't seem to include the chance player.
The paper shows that CFR only solves to the end of a betting round, but subgame seems to extend to the start of the next betting round. In that case, there is a gap between the end of a betting round and the start of the next round when the board cards are dealt. How does ReBeL model it? Does it model the chance node and its children to be a separate tree itself. The value of the chance node is the average value of the values of all its children whose values are queried from the value network. Is it true? (referred to: the paragraph in the paper: "Our agent always solves to the end of the current betting round regardless...")