Here is a short report about my Crazy Zero experiments. Crazy Zero is a generic implementation of AlphaZero. I already used it for Go, shogi, gomoku, renju, othello, and chess.
The version that played on the Ataxx server (http://server.ataxx.org/) is less than one day old. The network architecture is 10 layers of 64 units. 4 inputs: empty, white, black, and (time-since-last-split / 100). The policy has one output for splits, and 16 outputs for jumps. I have only one convolutional output channel for the value: I simply apply the logistic function to the sum of values of this output. This way the network is purely convolutional, and could work on any board size.
I trained on a single machine (i9 + RTX 2080Ti), by iteratively generating self-play games (400 playouts per move), and training a new network (each time from scratch) with the data generated by the previous network.
At this moment, here is the number of games I generated with each network:
Code: Select all
1 2048
2 4096
3 4096
4 4096
5 6144
6 14336
7 14336
8 30720
9 243712
10 126976
Code: Select all
-----------------------------------
player opponent wins draws losses
-----------------------------------
1 2 1 738 0 30
2 3 2 718 0 50
3 4 3 715 0 53
4 5 4 613 0 155
5 6 5 543 1 224
6 7 6 489 2 277
7 8 7 458 0 310
8 9 8 591 0 177
9 10 9 569 0 199
I am looking forward to challenging Ataxx Zero. Maybe this evening.
I am glad I discovered this game. It looks to be very rich, and at least as interesting as Othello.