The dominant approach to building more advanced artificial intelligences is simply to scale up their computing power, but AI firm DeepMind says we are reaching a point of diminishing returns
8 December 2021
DeepMind says that teaching machines to realistically mimic human language is more complex than simply throwing increasing amounts of computing power at the problem, despite that being the predominant strategy in the field.
In recent years, most progress in building artificial intelligences (AIs) has come from increasing their size and training them with ever more data on the biggest computer available. But this makes the AIs expensive, unwieldy and hungry for resources. A recent system created by Microsoft and Nvidia required more than a month of supercomputer access and almost 4500 high-power graphics cards to train, at a cost of millions of dollars.
In a bid to find alternatives, AI firm DeepMind has created a model that can look up information in a vast database, in a similar way that a human would use a search engine. This avoids the need for all of its knowledge to be baked in during training. Researchers at the company claim this strategy can create models that rival state-of-the-art tools while being much less complex.
Language AIs seemed to take a big leap last year with the release of GPT-3, an model developed by US firm OpenAI that surprised researchers with its ability to generate fluent streams of text. Since then, models have grown ever bigger: GPT-3 used 175 billion parameters for its neural network, while Microsoft and Nvidia’s recent model, the Megatron-Turing Natural Language Generation, has 530 billion parameters.
But there are limits to scale – Megatron managed to push performance benchmarks only slightly higher than GPT-3 despite its huge step up in parameters. On one benchmark, where an AI is required to predict the last word of sentences, GPT-3 had an accuracy of up to 86.4 per cent, while Megatron reached 87.2 per cent.
Researchers at DeepMind initially investigated the effects of scale on similar systems by creating six language models, ranging from having 44 million parameters to 280 billion. It then evaluated their abilities on a group of 152 diverse tasks and discovered that scale led to improved ability. The largest model beat GPT-3 in around 82 per cent of tests. In a common benchmark reading comprehension test, it scored 71.6, which is higher than GPT-3’s 46.8 and Megatron’s 47.9.
But the DeepMind team found that there while there were significant gains from scale in some areas, others, such as logical and mathematical reasoning, saw much less benefit. The company now says that scale alone isn’t how it intends to reach its goal of creating a realistic language model that can understand complex logical statements, and has released a model called Retrieval-Enhanced Transformer (RETRO) that researches information rather than memorising it.
RETRO has 7 billion parameters, 25 times fewer than GPT-3, but can access an external database of around 2 trillion pieces of information. DeepMind claims that the smaller model takes less time, energy and computing power to train but can still rival the performance of GPT-3.
In a test against a standard language model with a similar number of parameters but without the ability to look up information, RETRO scored 45.5 in a benchmark test on accurately answering natural language questions, while the control model scored just 30.4.
“Being able to look things up on the fly from a large knowledge base can often be useful instead of having to memorise everything,” says Jack Rae at DeepMind. “The objective is just trying to emulate human behaviour from what it can see on the internet.”
This approach also has other benefits. While AI models are typically black boxes whose inner workings are a mystery, it is possible to see which pieces of external data RETRO refers to. This can allow citation and some basic explanation as to how it arrived at particular results.
It also allows the model to be updated more easily by simply adding to the external data; for instance, a traditional model trained in 2020 may respond to a question about who won Wimbledon by saying “Simona Halep”, but RETRO would be able to scour new documents and know that “Ashleigh Barty” was a more contemporaneous answer.
Samuel Bowman at New York University says that the ideas behind RETRO aren’t necessarily novel, but are important because of DeepMind’s influence in the field of AI. “There’s still a lot we don’t know about how to safely and productively manage models at current scales, and that’s probably going to get harder with scale in many ways, even as it gets easier in some.”
One concern is that the high cost of large-scale AI could leave it the preserve of large corporations. “It seems considerate of them to not try to push the limits here, since that could reinforce an arms-race dynamic,” says Bowman.
More on these topics: