Skip to content

Investigate how Michi(-C) is stronger with equal number of playouts #2 #95

@gonmf

Description

@gonmf

Continuation of #5

10k playouts/turn vs Michi-C single threaded: 29.1% (55 games)

Long 10k playouts CLOP self-play run:


Removed MCTS leaf expansion delay for both michi-c and matilda.

Perhaps related to using very different policies for playouts and heuristic MC RAVE ?

winrate 39.7% (68) (10k playouts/turn, alternate colors, 7.5 komi)
winrate 45.3% (53) (same as above but with komi 5.5)


both using MC only instead of RAVE/criticality etc
winrate 60% (92)

with matilda with same RAVE urgency function
with original matilda RAVE eqiv parameter, with heuristic mc rave: 46.5% (71)
with michi equiv parameter, with heuristic mc rave: 23.9 (88)
with original matilda RAVE eqiv parameter, without heuristic mc rave: 49.5 (85)
with michi equiv parameter, without heuristic mc rave: 33.5 (182)


self play with black with Michi RAVE quality, single threaded, no expansion delay: 45.3% (75)

RAVE is working worse in matilda than in michi?


AMAF visits in matilda are stored leaf to root (a state is only influenced by transitions that appeared after); michi stores visits immediatly. Furthermore matilda always replaced later visits and michi didnt
with michi style of AMAF info and replacing:
without replacing: all terrible


self play without criticality info being saved: slightly worse


both using MC only instead of RAVE/criticality again: 54% (100)



9x9

Base single threaded without NN vs Michi, both 10k playouts, no expansion delay: 44% (722)
Baseline without RAVE: 54.7% (725)
Baseline with reduced priors+: 48.5% (526) (line2, line3, empty, line1x, line2x, line3x, corner)
Baseline on server to ensure code correctness: 42.5% (865) [1]

[1] with lineNx priors removed: 45.9 (1598) [2]

[2] without line2, line3, empty priors: 45,5 (1812)
[2] without bad play prior: 45.6 (2114)
[2] without line2, line3, empty: 46.9 (3558) [3]

[3] without near_last: 43.2% (3044)
baseline again: 44.8% (4377)

19x19 tests

baseline with conditions as [1]: ~9% (~280)
baseline with both without RAVE: ~3% (~420)
baseline without lineNx priors: ~10.3% (~340) [4]
[4] with nakade in playouts but only after captures: ~9% (~600)

13x13 to be faster

baseline with conditions as [1]: ~19% (~3025) [5]
[5] without lineNx priors:

All cleared with changes to test environment; improved Michi code, bug fixes, all matilda things reverted back, 9x9, no expansion delay, 10k

baseline: ~43% (1150)
with michi using matilda RNG: ~46% (2100) [6]
[6] repeated with a few simplifications, just to make sure: ~40% (150)
[6] with matilda pattern matching: ~59% (1275) [7]
[7] with matilda pattern matching with inverted color to play in pattern matching: ~62% (700)
back with baseline, michi pattern matching, but with mtld using mogo patterns: ~39% (1600)

[6] with 20k playouts: 46.3% (9900)
[6] with 20k playouts (repeated to make sure they were using 20k: ~47% (3300)

13x13

baseline: 20k playouts, no expansion delay, using mtld rng: ~27.5 (2550) [7]
[7] without line1x, line2x, line3x: 32% (2002) [8]

lineNx priors scrapped from master

[8] without line2, line3, empty: ~30% (2400)

both without RAVE (mtld equiv param 2003, michi 3500): ~60% (~3500)
[8] without NN: ~32% (~600)

This seems to be going nowhere. Advise a rewrite of MCTS priors/playouts to use same base routines, like Pachi/Michi/Fuego/MoGo use. Perhaps the problem is it uses very different strategies in these two places.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions