Scientists have created a neural network with the human-like ability to make generalizations about language1. The artificial intelligence (AI) system performs about as well as humans at folding newly learned words into an existing vocabulary and using them in fresh contexts, which is a key aspect of human cognition known as systematic generalization.
The researchers gave the same task to the AI model that underlies the chatbot ChatGPT, and found that it performs much worse on such a test than either the new neural net or people, despite the chatbot’s uncanny ability to converse in a human-like manner.
The work, published on 25 October in Nature, could lead to machines that interact with people more naturally than do even the best AI systems today. Although systems based on large language models, such as ChatGPT, are adept at conversation in many contexts, they display glaring gaps and inconsistencies in others.
The neural network’s human-like performance suggests there has been a “breakthrough in the ability to train networks to be systematic”, says Paul Smolensky, a cognitive scientist who specializes in language at Johns Hopkins University in Baltimore, Maryland.
Systematic generalization is demonstrated by people’s ability to effortlessly use newly acquired words in new settings. For example, once someone has grasped the meaning of the word ‘photobomb’, they will be able to use it in a variety of situations, such as ‘photobomb twice’ or ‘photobomb during a Zoom call’. Similarly, someone who understands the sentence ‘the cat chases the dog’ will also understand ‘the dog chases the cat’ without much extra thought.
But this ability does not come innately to neural networks, a method of emulating human cognition that has dominated artificial-intelligence research, says Brenden Lake, a cognitive computational scientist at New York University and co-author of the study. Unlike people, neural nets struggle to use a new word until they have been trained on many sample texts that use that word. Artificial-intelligence researchers have sparred for nearly 40 years as to whether neural networks could ever be a plausible model of human cognition if they cannot demonstrate this type of systematicity.
Title not specified
To attempt to settle this debate, the authors first tested 25 people on how well they deploy newly learnt words to different situations. The researchers ensured the participants would be learning the words for the first time by testing them on a pseudo-language consisting of two categories of nonsense words. ‘Primitive’ words such as ‘dax,’ ‘wif’ and ‘lug’ represented basic, concrete actions such as ‘skip’ and ‘jump’. More abstract ‘function’ words such as ‘blicket’, ‘kiki’ and ’fep’ specified rules for using and combining the primitives, resulting in sequences such as ‘jump three times’ or ‘skip backwards’.
Participants were trained to link each primitive word with a circle of a particular colour, so a red circle represents ‘dax’, and a blue circle represents ‘lug’. The researchers then showed the participants combinations of primitive and function words alongside the patterns of circles that would result when the functions were applied to the primitives. For example, the phrase ‘dax fep’ was shown with three red circles, and ‘lug fep’ with three blue circles, indicating that fep denotes an abstract rule to repeat a primitive three times.
Finally, the researchers tested participants’ ability to apply these abstract rules by giving them complex combinations of primitives and functions. They then had to select the correct colour and number of circles and place them in the appropriate order.
As predicted, people excelled at this task; they chose the correct combination of coloured circles about 80% of the time, on average. When they did make errors, the researchers noticed that these followed a pattern that reflected known human biases.
Next, the researchers trained a neural network to do a task similar to the one presented to participants, by programming it to learn from its mistakes. This approach allowed the AI to learn as it completed each task rather than using a static data set, which is the standard approach to training neural nets. To make the neural net human-like, the authors trained it to reproduce the patterns of errors they observed in humans’ test results. When the neural net was then tested on fresh puzzles, its answers corresponded almost exactly to those of the human volunteers, and in some cases exceeded their performance.
A test of artificial intelligence
By contrast, GPT-4 struggled with the same task, failing, on average, between 42 and 86% of the time, depending on how the researchers presented the task. “It’s not magic, it’s practice,” Lake says. “Much like a child also gets practice when learning their native language, the models improve their compositional skills through a series of compositional learning tasks.”
Melanie Mitchell, a computer and cognitive scientist at the Santa Fe Institute in New Mexico, says this study is an interesting proof of principle, but it remains to be seen if this training method can scale up to generalize across a much larger data set or even to images. Lake hopes to tackle this problem by studying how people develop a knack for systematic generalization from a young age, and incorporating those findings to build a more robust neural net.
Elia Bruni, a specialist in natural language processing at the University of Osnabrück in Germany, says this research could make neural networks more-efficient learners. This would reduce the gargantuan amount of data necessary to train systems such as ChatGPT and would minimize ‘hallucination’, which occurs when AI perceives patterns that are non-existent and creates inaccurate outputs. “Infusing systematicity into neural networks is a big deal,” Bruni says. “It could tackle both these issues at the same time.”