-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Continuation of #5
10k playouts/turn vs Michi-C single threaded: 29.1% (55 games)
Long 10k playouts CLOP self-play run:
Removed MCTS leaf expansion delay for both michi-c and matilda.
Perhaps related to using very different policies for playouts and heuristic MC RAVE ?
winrate 39.7% (68) (10k playouts/turn, alternate colors, 7.5 komi)
winrate 45.3% (53) (same as above but with komi 5.5)
both using MC only instead of RAVE/criticality etc
winrate 60% (92)
with matilda with same RAVE urgency function
with original matilda RAVE eqiv parameter, with heuristic mc rave: 46.5% (71)
with michi equiv parameter, with heuristic mc rave: 23.9 (88)
with original matilda RAVE eqiv parameter, without heuristic mc rave: 49.5 (85)
with michi equiv parameter, without heuristic mc rave: 33.5 (182)
self play with black with Michi RAVE quality, single threaded, no expansion delay: 45.3% (75)
RAVE is working worse in matilda than in michi?
AMAF visits in matilda are stored leaf to root (a state is only influenced by transitions that appeared after); michi stores visits immediatly. Furthermore matilda always replaced later visits and michi didnt
with michi style of AMAF info and replacing:
without replacing: all terrible
self play without criticality info being saved: slightly worse
both using MC only instead of RAVE/criticality again: 54% (100)
9x9
Base single threaded without NN vs Michi, both 10k playouts, no expansion delay: 44% (722)
Baseline without RAVE: 54.7% (725)
Baseline with reduced priors+: 48.5% (526) (line2, line3, empty, line1x, line2x, line3x, corner)
Baseline on server to ensure code correctness: 42.5% (865) [1]
[1] with lineNx priors removed: 45.9 (1598) [2]
[2] without line2, line3, empty priors: 45,5 (1812)
[2] without bad play prior: 45.6 (2114)
[2] without line2, line3, empty: 46.9 (3558) [3]
[3] without near_last: 43.2% (3044)
baseline again: 44.8% (4377)
19x19 tests
baseline with conditions as [1]: ~9% (~280)
baseline with both without RAVE: ~3% (~420)
baseline without lineNx priors: ~10.3% (~340) [4]
[4] with nakade in playouts but only after captures: ~9% (~600)
13x13 to be faster
baseline with conditions as [1]: ~19% (~3025) [5]
[5] without lineNx priors:
All cleared with changes to test environment; improved Michi code, bug fixes, all matilda things reverted back, 9x9, no expansion delay, 10k
baseline: ~43% (1150)
with michi using matilda RNG: ~46% (2100) [6]
[6] repeated with a few simplifications, just to make sure: ~40% (150)
[6] with matilda pattern matching: ~59% (1275) [7]
[7] with matilda pattern matching with inverted color to play in pattern matching: ~62% (700)
back with baseline, michi pattern matching, but with mtld using mogo patterns: ~39% (1600)
[6] with 20k playouts: 46.3% (9900)
[6] with 20k playouts (repeated to make sure they were using 20k: ~47% (3300)
13x13
baseline: 20k playouts, no expansion delay, using mtld rng: ~27.5 (2550) [7]
[7] without line1x, line2x, line3x: 32% (2002) [8]
lineNx priors scrapped from master
[8] without line2, line3, empty: ~30% (2400)
both without RAVE (mtld equiv param 2003, michi 3500): ~60% (~3500)
[8] without NN: ~32% (~600)
This seems to be going nowhere. Advise a rewrite of MCTS priors/playouts to use same base routines, like Pachi/Michi/Fuego/MoGo use. Perhaps the problem is it uses very different strategies in these two places.