Inside the mind of a superhuman Go model: How does Leela Zero read ladders?

An abstract strategy board game for two players
Post Reply
Rémi Coulom
Posts: 219
Joined: Tue Feb 12, 2008 8:31 pm
Contact:

Inside the mind of a superhuman Go model: How does Leela Zero read ladders?

Post by Rémi Coulom »

https://www.lesswrong.com/posts/FF8i6SL ... ela-zero-2
We did some interpretability on Leela Zero, a superhuman Go model. With a technique similar to the logit lens, we found that the residual structure of Leela Zero induces a preferred basis throughout network, giving rise to persistent, interpretable channels. By directly analyzing the weights of the policy and value heads, we found that the model stores information related to the probability of the pass move along the top edge of the board, and those related to the board value in checkerboard patterns. We also took a deep dive into a specific Go technique, the ladder, and identified a very small subset of model components that are causally responsible for the model’s judgement of ladders.
Post Reply