https://storage.googleapis.com/deepmind ... ing-go.pdf
It involves a few innovative ideas, but the biggest one is incorporating an evaluation function almost identical to what I had in mind (see my previous thread) to the MCTS paradigm.
I should mention that their program is estimated to be 1,200 Elo points stronger than CrazyStone, the strongest commercially available opponent. [Is there an emoticon for eyes falling off of their sockets?]
Well, on the one hand someone beat me to it. On the other, I can learn how to do it from them.

From a quick read of their paper, I gather these are the steps of the training process:
(1) Train a "traditional" NN to compute a probability distribution for the next move, similar to what's described in three recent papers (I posted links to them in my previous thread).
(2) Make a copy of that NN and tweak it using reinforcement learning, to reward moves that win games and penalize moves that lose them (the network from (1) was just trying to imitate humans). As opponents they use randomly picked previous versions of their own network; some variety in the opponents is supposed to make their NN more robust, which makes sense.
(3) Generate a position using the first network (I think the exact procedure is: pick a random number of moves to be played; use the NN from (1) to play that many moves; make a random legal move), then run the network from (2) against itself to the end of the game and label the position by the result. Rinse. Repeat... THIRTY MILLION TIMES!!!
(4) Use the data generated in (3) to train a NN that can be used as an evaluation function.
The network from (1) is used to create initial probability distributions for moves in their MCTS tree. The network from (4) is used at the leaves of the MCTS tree instead of running playouts to completion. Well, not really: They do both and take the average, which seems to be way stronger. This might indicate that their evaluation function still has some serious weaknesses that are ameliorated by mixing the answer with the results of random playouts.
So perhaps their evaluation function is still not good enough to make alpha-beta search a good alternative to MCTS. If they end up publishing the trained networks from (1) and (4), I will certainly try it myself. Networks like (1) have already been made available by other researchers, so there is a chance this may happen. If not, I may try to reproduce what they did, but with limited resources ["...thirty million..." still echoing in my head].