Inspired by Alphago and Alphazero, I've started exploring NN methods for game AIs, and
neural nets in general. For starters, I found that there are no good forums for general
discussion, hence this Machine Learning section, and this post.
I'm kind of a bottom-up guy, I like to understand the stuff I'm using "all the way down", so instead
of jumping immediately to some standard and well known AI framework, I've looked at several and
built my own set of java classes containing just the pieces I want to use. I'm using hex played
on a 7x7 grid as my target - this is about the smallest and simplest game I can imagine that still
has interesting behavior, and I already have both an alpha-beta and a MCTS based robot player.
Either
replace the static evaluator from the alpha-beta robot with a neural network. A NN can calculate any
function, right? The existing static evaluator builds an elaborate structure based on chains of stones
and possible connections between them. It ought to be possible to build a NN trained to produce
the same outputs, so in principle you could plug in the network instead of the static evaluator. There's
no benefit to such a simple replacement, but if you train the network to emulate the single value output
from the top level search, then the NN can replace the whole search. Then, if you have a net trained to
replace an N-ply search, you can use it to train a replacement for a 2*N ply search. Rinse, repeat.
Or
guide the mcts tree search by using a NN to establish prior probabilities for the candidate moves. After
you complete a MCTS search, you have a set of win probabilities for all the possible moves. You can use
these are prior probabilities to bias the UCT values for the same search; and during the random playout
phase, you can use these prior probabilities to bias the selection of moves (instead of purely random).
Some preliminary notes based on experiments aimed in these two directions.
1) The forward phase of evaluating a network is completely straightforward, but the proper shape of
the network, representation for the inputs, representation for the outputs, etc is a complete black art.
Theoretically any input representation that contains the board position could work, and any number
of hidden layers with any size and connectivity might work, but very little can be said about what
will work.
2) The backward/training phase is even more problematic. Aside from the basic idea of changing
the network weights to reduce the error implied by the difference between the desired and observed
results, the exact algorithm for doing so has no exact form or strict rules. It's very easy for network
training to start to run backwards and produce worse and worse results. It's very easy for more or bigger
layers to take longer to train and produce worse results. It's very easy for training to plateau at some point,
and at that point there is no information about why.
some newbie notes about Neural Net AIs
some newbie notes about Neural Net AIs
my game site is http://Boardspace.net
Re: some newbie notes about Neural Net AIs
I've been using Hex on a 7x7 board as a test bed. My basic approach is to use
the visit counts at top level of a MCTS search as the target; make the network
learn what the visit count should be, and use that prediction to weight the random
playout phase of new searches. I understand this is the basic mechanism used by
alphago.
1) a good visualization is worth it's weight in gold. For example, the pattern below
turned out to be characteristic for probabilities of an empty board. Being able to
see this pattern develop is much more informative than just tables of current error. learning in progress 2) test for effectiveness at every step. If you expect progress (ie; better play) verify it. When you
don't see it, don't move on until you understand. And beware small sample sizes when verifying
incremental improvements. It takes a lot of games to verify a 1% improvement in win rate.
3) while it may be tempting to learn continuously, it's better to save training data and learn by training
on batches of training data. There are several good reasons to do it that way. First and foremost, it's
possible to re-use the same old training data with different versions of networks, different versions of training
algorithms, different representations of the input and output layers and so on. Generating the training
data is a lot slower than using it to train networks, and networks trained from the same data are apples-to-apples
comparisons.
the visit counts at top level of a MCTS search as the target; make the network
learn what the visit count should be, and use that prediction to weight the random
playout phase of new searches. I understand this is the basic mechanism used by
alphago.
1) a good visualization is worth it's weight in gold. For example, the pattern below
turned out to be characteristic for probabilities of an empty board. Being able to
see this pattern develop is much more informative than just tables of current error. learning in progress 2) test for effectiveness at every step. If you expect progress (ie; better play) verify it. When you
don't see it, don't move on until you understand. And beware small sample sizes when verifying
incremental improvements. It takes a lot of games to verify a 1% improvement in win rate.
3) while it may be tempting to learn continuously, it's better to save training data and learn by training
on batches of training data. There are several good reasons to do it that way. First and foremost, it's
possible to re-use the same old training data with different versions of networks, different versions of training
algorithms, different representations of the input and output layers and so on. Generating the training
data is a lot slower than using it to train networks, and networks trained from the same data are apples-to-apples
comparisons.
my game site is http://Boardspace.net