You and your opponent each have 10 bargaining chips, and you will play 5 matches to compare your bets. The person with the most wins will ultimately be declared the winner.
Play the game here: betting game
Here is an example of the game:
- Match 1: your bet is 3 and opponen's bet is 2. You win.
- Match 2: your bet is 0 and opponen's bet is 2. You lose.
- Match 3: your bet is 4 and opponen's bet is 4. The game is tied.
- Match 4: your bet is 2 and opponen's bet is 1. You win.
- Match 5: your bet is 1 and opponen's bet is 1. The game is tied.
Since you won 2 matches and your opponent won only 1, you have ultimately won the game. This Big Small Game environment is written inBSG.py.
The model was written in dqn.py
Reinforcement learning was employed to train the model, which initially had no knowledge of the game's rules. The deep Q-network in Keras was used for training. During each match, the model's output for a bet larger than the remaining chips was assigned a value of -1. The winner of the game received a reward of 1, while the loser received a reward of -1. As the reward is sparse and only given at the end of the game, the Temporal-Difference method was used to assign the reward to the prior decision that led to the win. The Epsilon-Greedy Exploration with decreasing epsilon was used to encourage random decision at early game.
Model input: A list of four integers that store [my remaining chips, opponent's remaining chips, my wins, opponent's wins, number of finished matches]
Model output: A list of n_action (ex. 11) real numbers, that represent the reward of playing the corresponding index in the next game.
The training are composed of two processes.
trainer_self.py: the model learned the rules and developed strategies for winning throug self-play. The self-play was repeated 50000 times.trainer_rand.py: the model was playing against a random-bet agentrandseq.pythat randomly separates bargaining chips into 5 portions. The game was repeated 50000 times.
An AI was trained to play a 5-round game with 10 bargaining chips. It can eventaully out-perform the random-bet agent, achieving a win rate greater than 50%. The plot below illustrates the win rate's progression during the training process. The fluctuations in the plot can be attributed to the utilization of a relatively small batch_size = 5.
