Heart

Menu

Stay in the loop

Never miss a beat with the latest in VET news delivered straight to your inbox.

Stay in the loop

Never miss a beat with the latest in VET news delivered straight to your inbox.

AlphaGo Master vs AlphaGo Zero - The Power of Reinforcement Learning

July 20, 2018

Just when we think artificial intelligence has proved its superiority over human intellect, it strikes again. Google DeepMind’s AlphaGo computer programs use artificial intelligence to challenge themselves, and humans, to the ancient game of Go. And while the 2015 version of AlphaGo proved capable enough to dominate a human Go player, the recent iterations of the program just keep getting better.

To play Go, AlphaGo Master used the Monte Carlo Tree Search algorithm, with the help of some reinforcement learning neural nets, to chop off as many branches and as much height of the search tree and eventually settle on the node with the highest probability of winning. So how does its successor, AlphaGo Zero, differ?

The latest updates present in AlphaGo Zero left researchers in awe. One of the major changes is the removal of its ability to learn from human strategy. AlphaGo using a chess playbook to memorise moves? Gone. From now on, it’s just an AI playing against itself and generating gameplays for later training. In other words, it completely disregarded hundreds of years of human exploration in this field, and started with 100% randomness to find strategies.

As it turns out, the newer version of AlphaGo reached a super-human level of performance after a mere 70 hours of training, reaching the level of its predecessor, AlphaGo Master, in only 40 days. Hundreds of years of human Go knowledge was surpassed by artificial intelligence in less than three days.

Another major structural change in AlphaGo Zero was the riddance of Monte Carlo Tree Search.  Yes, that intimidatingly complicated chart with big branches is gone—for the better.

Rolling out the game is no small feat. The deeper the search goes and the more branches it considers, the higher the accuracy of predicting the right move. Ditching the Monte Carlo method is essentially a trade-off between accuracy and time. By dropping the tree search and relying solely on the quality of trained RL (reinforcement learning) networks, AlphaGo Zero is able to run more games in a specified time range, generating more training data and rendering a higher quality of networks, all with jaw-dropping speed.

But how well can it play? We know that AlphaGo Master already defeated the world’s top human player, Ke Jie—and AlphaGo Zero’s Elo rating is 327 units higher than Master’s. In other words, it’s pretty damn good.

Reinforcement learning is the kind of machine learning technique that doesn’t rely on big data. With AlphaGo Zero, DeepMind pushed RL’s independence from data further by starting with 100% randomness. In the real world, not every training scenario is conveniently provided with a big volume of datasets, which makes reinforcement learning an extraordinarily promising tool to use.