Ever since the astounding triumph of Alphago over Lee Sedol at the Go contest in 2016, the world’s attention has been drawn to artificial intelligence and reinforcement learning. This victory signalled that machine learning was no longer simply about big data classification, but was making progress in the realm of true intelligence.
Reinforcement learning (RL) introduces the concept of an agent, and addresses the problem of making the most-rewarded decision a subjective entity in a known or unknown environment. It could be seen as a learning approach sitting in between supervised and unsupervised learning, since it involves labelling inputs, only that the label is sparse and time-delaying.
In life we aren’t given labels of every possible behaviour in the world, but we learn lessons by exploring strategies on our own — hence RL provides the closest problem setting to the learning process of a human brain. This accounts for the excitement that the progress elicits from machine learning scholars.
So far there have been two approaches to a reinforcement learning problem.
MDPs are a mathematical framework which model the world as a set of consecutive states with values, and inside this world there is a rational agent that makes decisions by weighing rewards caused by different actions.
If the state values are unknown, the agent may begin by interacting with the world first, observing the consequences — and with enough experience, it may exploit the knowledge and make optimal decisions. It builds on top of the rules drawn from observing human intelligence in behavioural psychology experiments, rather than modelling on a grander scale the rules that cultivate human intelligence, which is where evolutionary computation comes from.
Evolutionary computation is a family of algorithms which apply the concept of evolution to the computation area as a searching technique to find the fittest solution. More specifically, it utilises concepts from Darwin’s theory of evolution — such as mutation, crossover, and fitness — and models the computer to perform natural selection in search of an optimal solution. In other words, it does not attempt to build intelligence from scratch, as long as it renders quasi-intelligent results.
RL algorithms can be really powerful tools in a range of tasks and may potentially contribute to the realisation of general AI, which in return may greatly improve the education industry in terms of adaptive learning experience, student path prediction, and unbiased grading systems.
VET moves fast. Stay informed, with blogs straight to your inbox.