Vision Transformers for Computer Go

An abstract strategy board game for two players
Post Reply
Rémi Coulom
Posts: 219
Joined: Tue Feb 12, 2008 8:31 pm
Contact:

Vision Transformers for Computer Go

Post by Rémi Coulom »

https://arxiv.org/abs/2309.12675
Amani Sagri, Tristan Cazenave, Jérôme Arjonilla, Abdallah Saffidine wrote:Motivated by the success of transformers in various fields, such as language understanding and image analysis, this investigation explores their application in the context of the game of Go. In particular, our study focuses on the analysis of the Transformer in Vision. Through a detailed analysis of numerous points such as prediction accuracy, win rates, memory, speed, size, or even learning rate, we have been able to highlight the substantial role that transformers can play in the game of Go. This study was carried out by comparing them to the usual Residual Networks.
The abstract does not give any result. Conclusion says:
EfficientFormer’s architecture showcases remarkable parameter efficiency, especially when compared to the Residual Network architecture, particularly in larger networks. This translates into superior performance on CPU, making it the preferred choice in this domain. Interestingly, when it comes to GPU utilization, both architectures perform at a similar level, especially for the largest networks in our experimentation.

Moreover, it is worth highlighting that the EfficientFormer architecture we explored for Go is not limited to this particular game; it exhibits versatility and applicability to a wide range of other games and domains.
zakki
Posts: 7
Joined: Fri Jan 13, 2023 5:22 pm

Re: Vision Transformers for Computer Go

Post by zakki »

I couldn't understand model architecture in 3.4 Adaptation for the game of Go.
However, in the context of Go, the input board’s dimensions were fixed at 19 × 19, and it was imperative to preserve this size throughout the training process to avoid losing critical information.
Does this mean that they used only pooling, 1x1 conv and GeLU in early stages of network, and spatial information is handled by pooling alone?
If so, it looks like shift operation. (Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions)
Tristan.Cazenave
Posts: 1
Joined: Tue Nov 07, 2023 4:49 pm

Re: Vision Transformers for Computer Go

Post by Tristan.Cazenave »

It is only for the residual network that we keep 19x19 planes for the blocks. The EfficientFormer does only uses a 19x19 plane for the policy head.
zakki
Posts: 7
Joined: Fri Jan 13, 2023 5:22 pm

Re: Vision Transformers for Computer Go

Post by zakki »

But they wrote "before feeding it into the transformer" in previous sentence of the same paragraph. I think that "the transformer" refers to the 3D part of the EfficientFormer, and they use 19x19 planes in the 4D part of EfficientFormer.
Post Reply