In 2017 DeepMind published its latest breakthrough on the Go-playing artificial intelligence AlphaGo Zero, inciting as much excitement from the public as its predecessor AlphaGo Master. What’s the big deal with this newly-evolved product, and how does it work so effectively?
Let’s delve into how AlphaGo was able to dominate the world’s best Go player.
The rationale behind both AlphaGos is actually surprisingly simpler than most people would imagine. We need only think about how we play any kind of strategic game—be it poker, chess, or tic-tac-toe. We talk to ourselves.
We think things like, “If I play this move, my opponent might do this… I could move here, but then I’ll lose...” This thinking process is similar to an algorithm called Monte Carlo Tree Search (MCTS), which constructs a tree of all possible game moves and finds the optimal sequence path. So instead of picking only the most likely opponent move, it rolls out all of them until the end state is reached.
Following this framework, the most naive way to create a game-playing AI would be to exhaustively search the tree after each opponent’s move and output the action with the highest likelihood of winning. But the problem with this approach is that it may take forever to build this tree, and even after we manage to build the tree it’ll take forever to exhaustively search it. Take the 3x3 game of tic-tac-toe as an example:
There are 9! (>360,000) game states in this Monte Carlo Tree. Think about this: for a game as simple as tic-tac-toe, artificial intelligence would have to search 78,000 tree nodes before making a move.
What would it be like for the AI of Go, which consists of 19x19 grids, to work? Here’s just a portion of the Go tree monstrosity:
The main goal for building a strategic game-play AI has always been straightforward: to reduce the tree size as much as possible—but this is also the hardest part. DeepBlue, the AI that beat the then world chess champion, trimmed the Monte Carlo tree with the help of human experts. But AlphaGo achieved this solely by an algorithm, which is what made the achievement so exciting.
So how do we reduce the tree? The answer is pretty straightforward: by cutting off the branches and reducing its width and height.
AlphaGo uses reinforcement learning to generate two networks: one that predicts the opponent’s move given the current game state, and another that evaluates how likely is it to win from this state. With the help of these two networks, AlphaGo significantly decreases its ‘internal monologue’ by reducing the branches.
It can say to itself,
“The opponent will most likely do this, so I won’t consider the rest” (removing the branches).
“If I achieve this game state, I can most likely win from there” (truncating the height).
One mystery remains: the origin of the networks. The action prediction network is generated through observing a human player’s moves, similar to how chess students might memorise a master player’s moves.
State evaluation network, on the other hand, is built on self-play datasets generated from an action prediction network. Imagine a human player who plays countless games until almost all possible game states are achieved (21,919 in total). The player stops and starts to evaluate how likely they are to win from each game state.
The Monte Carlo Tree Structures demonstrate just how impressive it is that AlphaGo was able to dominate a human Go player. And Google has already improved upon the technology, with the launch of AlphaGo Zero in 2017 showing that AI is certainly set to make waves in the next few years.
VET moves fast. Stay informed, with blogs straight to your inbox.