Skip to content

Evaluation of different RL decision methods against the K-armed bandits test

License

Notifications You must be signed in to change notification settings

sonic597/rl-testing-ground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

RL-testing-ground

Evaluation of different RL decision methods against the K-armed bandits test

Running the code

Needs Matplotlib and numpy. Values for the number of episodes, arms (k), and iterations can be changed in lines 4,5, and 6 of the file.

Current Decision methods

  • Elipson greedy. A threshold is given - this theshold represents the probability that the desicion method will either explore (do a random action) or exploit (do the highest valued action from current data)
  • Elipson decay. Like elipson greedy but the threshold decays exponentially with experience
  • Upper confidence bound. Chooses the maximal action, but adding "optimistic" value based on how much experience it has with each arm (more uncertain, more optimistic) in a logarithmic fashion.

About

Evaluation of different RL decision methods against the K-armed bandits test

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages