Hi everyone! I am trying to program my own version of AlphaZero for Chess. However, when you start training, all moves are random meaning it is highly unlikely that the game will reach a terminated state. So how do you update the values in the MCTS nodes, if your policies and values are still just random.
Is this a common problem that is solved by just more computational power and more iterations? Has anyone tackled a similar problem or knows how to continue?
Any help is appreciated, thank you!
How does alphazero start learning if moves are random and games don't finish
-
- Posts: 219
- Joined: Tue Feb 12, 2008 8:31 pm
- Contact:
Re: How does alphazero start learning if moves are random and games don't finish
In my experience, this is not a problem for chess. Even if you play the initial games completely at random, a few of them will have a mate. That should be enough to drive progress. After the initial iterations, games will look more normal.