Tic-Tac-Toe Bot

Program Usage

Train Agent:

python3 tictactoe.py train [Learning Rate] [Discount Rate] [Epsilon Decay] [Epoch Amount] [Filename] [Mutability Enabled]

where

Learning Rate: The rate at which each table value is updated through each iteration of the Bellman equation.

Discount Rate: The amount of "influence" a future state's Q value has on a previous state.

Epsilon Decay: The rate at which the chance of selecting a random Q value decreases.

Epoch Amount: The amount of game sessions the agent will play through. Note that since three different training sessions are executed, the true number of epochs is the given epoch amount multiplied by 3.

Filename: The destination file where the resultant trained table will be saved to.

Mutability Enabled: Whether or not mutability of the look-up table is allowed (True or False).

Play Against Agent:

python3 tictactoe.py play [Filename] [Player Number]

where

Filename: The source file of the trained agent's look-up table data.

Player Number: The player number that the user will play as (1 or 2).

Unbeatable Agent:

A provided file named "unbeatable" holds a look-up table trained according to the following parameters:

    Learning Rate = 0.1 
    Discount Rate = 0.9 
    Epsilon Decay = 0.00001 
    Epoch Amount = 200000 
    Filename = unbeatable 
    Mutability Enabled = true

To play against the unbeatable agent, use the following command:

python3 tictactoe.py play unbeatable 1

Mutability:

During training, the look-up table must be updated with a new Q value every time the Bellman equation is called. For a game where the agent performs n moves, the Bellman equation will be applied n times, and thus, the table will be updated n times as well. In the immutable paradigm, a new table must copied each time a value is updated. However, for a 3x3 table with 3 possible states per cell (X, O, or Empty), the agent's look-up table will contain 9 * 3^9 values. Thus, maintaining immutability severely impacts the efficiency of training. Because of this, a mutable version of the application of the Bellman equation was added. The training time of the mutable version is ~0.21 ms per epoch, while the immutable version is ~362 ms per epoch.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitattributes		.gitattributes
Bellman Equation.PNG		Bellman Equation.PNG
README.md		README.md
tictactoe.py		tictactoe.py
unbeatable		unbeatable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tic-Tac-Toe Bot

Program Usage

Train Agent:

Play Against Agent:

Unbeatable Agent:

Mutability:

About

Uh oh!

Releases

Packages

Languages

zeric11/Tic-Tac-Toe-Bot

Folders and files

Latest commit

History

Repository files navigation

Tic-Tac-Toe Bot

Program Usage

Train Agent:

Play Against Agent:

Unbeatable Agent:

Mutability:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages