python3 tictactoe.py train [Learning Rate] [Discount Rate] [Epsilon Decay] [Epoch Amount] [Filename] [Mutability Enabled]
where
Learning Rate: The rate at which each table value is updated through each iteration of the Bellman equation.
Discount Rate: The amount of "influence" a future state's Q value has on a previous state.
Epsilon Decay: The rate at which the chance of selecting a random Q value decreases.
Epoch Amount: The amount of game sessions the agent will play through.
Note that since three different training sessions are executed,
the true number of epochs is the given epoch amount multiplied by 3.
Filename: The destination file where the resultant trained table will be saved to.
Mutability Enabled: Whether or not mutability of the look-up table is allowed (True or False).
python3 tictactoe.py play [Filename] [Player Number]
where
Filename: The source file of the trained agent's look-up table data.
Player Number: The player number that the user will play as (1 or 2).
A provided file named "unbeatable" holds a look-up table trained according to the following parameters:
Learning Rate = 0.1
Discount Rate = 0.9
Epsilon Decay = 0.00001
Epoch Amount = 200000
Filename = unbeatable
Mutability Enabled = true
To play against the unbeatable agent, use the following command:
python3 tictactoe.py play unbeatable 1
During training, the look-up table must be updated with a new Q value every time the Bellman equation is called. For a game where the agent performs n moves, the Bellman equation will be applied n times, and thus, the table will be updated n times as well. In the immutable paradigm, a new table must copied each time a value is updated. However, for a 3x3 table with 3 possible states per cell (X, O, or Empty), the agent's look-up table will contain 9 * 3^9 values. Thus, maintaining immutability severely impacts the efficiency of training. Because of this, a mutable version of the application of the Bellman equation was added. The training time of the mutable version is ~0.21 ms per epoch, while the immutable version is ~362 ms per epoch.