https://www.deepmind.com/blog/article/A ... t-learning
Direct link to the paper: https://storage.googleapis.com/deepmind ... matted.pdf
The most interesting contribution seems to be their league system to reach the Nash equilibrium. Their method for off-policy learning is very interesting, too. This gives me some inspiration for my mahjong reinforcement-learning experiments.
Any topic that does not fit elsewhere
1 post • Page 1 of 1