T for Transformer
Paul Taylor
Geoffrey Hinton’s recent departure from Google made headline news, although the headline writers had to squeeze in an explanation of who he is so that readers would understand why it mattered. The Guardian went with ‘the Godfather of AI’. In the 1980s, Hinton showed how a ‘backpropagation’ algorithm allowed networks of simple learning units (‘neurons’) to learn complex patterns, since errors detected at the network’s output can be corrected by making adjustments that are propagated backwards through the network.
At the time such networks typically had only three or four layers and although they could analyse datasets with ten or twenty different features they couldn’t begin to tackle real world data such as images, made up millions of features. It was a remarkable achievement but perhaps the most important fact about Hinton’s career is that he helped keep the field alive over the next thirty years, until the ready availability of data and computing power made it possible to build networks deep enough to analyse images, interpret human speech and, most recently, detect the patterns that give language meaning.
Hinton is the great-great-grandson of George Boole, the Anglo-Irish mathematician who gave his name to the binary logic that underpins traditional computation. Hinton studied at Cambridge, Edinburgh and Sussex before moving to the States, then returning to the UK to found the Gatsby Computational Neuroscience Unit at UCL. Although he did his most important work after he left UCL, first for the University of Toronto, then for Google, something of his reputation lingered. Dennis Hassabis and Shane Legg, who went on to found DeepMind, met at the Gatsby. Hassabis later claimed that his intention was first ‘to solve intelligence, and then use that to solve everything else’, the suggestion being that we need AI in order to tackle real and pressing problems, notably climate change.
Google acquired DeepMind in 2014, in a deal that allowed Hassabis a relatively free hand both to pursue research into fundamental questions of intelligence and to apply the results to solving questions of societal importance. The best example of this is probably AlphaFold, an algorithm that can predict a protein’s shape, and therefore something of its behaviour, from its chemical formula. AlphaFold has transformed our understanding of protein structure and the open sharing of the vast amount of knowledge it has generated is an extraordinary gift to humanity.
Times have changed, however. Putin’s invasion of Ukraine has pushed up energy and food prices, households have had to cut spending, advertisers are losing revenue and Google, Facebook and others are laying off staff. Microsoft’s investment in OpenAI has led to the creation of new tools such as ChatGPT which seem able to answer more directly many of the questions that we might previously have answered through internet searches. (Ilya Sutskever, the chief scientist at OpenAI, is a former PhD student of Hinton’s.)
Google’s response has been to rethink its approach to AI, merging its in-house Google Brain team with DeepMind and putting Hassabis in charge. The challenge to Google’s dominance may not seem obvious given ChatGPT’s widely reported tendency to hallucinate, and it’s true that it’s a chatbot built on a ‘large language model’ rather than an organised repository of knowledge. You can’t use it to check the train times. It is however already very good at answering a wide range of questions, and getting better very, very fast.
Over the last ten years the proportion of advances in AI that have come from research teams in the big tech companies has been gradually increasing and they are now utterly dominant. Google Brain has been one of the most important. The T in GPT stands for transformer, an algorithm developed at Google Brain that has proved uncannily successful in identifying patterns, to the extent that models built by transformers can generate realistic images and video, meaningful text, and apparently intelligent answers to queries or solutions to problems.
Google’s merging of their AI operations is a measure of the scale of disruption they anticipate. It also seems significant that Hassabis has been willing to surrender his autonomy in return for the leadership role, and it’s hard not to be pessimistic about the future of AI research aimed at anything other than increasing revenue from online advertising.
Hinton once answered a question about the social consequences of his work by saying that whatever it led to, he would have done it anyway, the joy of discovery being too sweet. It sounds irresponsible, but at the time I read it as a mild riposte to the questioner, a suggestion that society shouldn’t rely on individuals with human frailties to protect it. In any event he seems now to take a very different line. He says has retired from Google to be free to articulate concerns about the consequences of AI.
The extraordinary fact about all this is how recently he seems to have changed his mind about the dangers of the technology his work has made possible. Things are changing very fast and the consequences of what we are seeing unfold are impossible to predict. Hinton, in his recent interviews, has highlighted concerns about ‘deep fakes’, about the impact of the economic dislocation caused by the automation of white-collar work, and, most frighteningly, about the power of software agents more intelligent than we are.
A lot of people are resistant to the idea that these large language models or LLMs really are intelligent. I’m not sure that debate is particularly enlightening. LLMs are clearly incredibly powerful and able to tackle a surprising range of tasks, some of which, if a person did them, we would say demonstrated intelligence. The real game-changer is that these tasks include a wide variety of things that LLMs weren’t programmed to do, or weren’t trained to, which makes the claim of artificial general intelligence or AGI seem defensible.
Daniel Kahneman makes a distinction, when describing human intelligence, between ‘thinking fast’ and ‘thinking slow’. The former is unconscious and effortless, the latter requires conscious effort and is cognitively demanding. LLMs seem to be astonishingly good at thinking fast, so much so that they can do things we would have to do by thinking slow. But they don’t really do thinking slow. They don’t work through what they are going to say before they start saying it, so they can’t backtrack and start again. They don’t have an internal scratchpad on which to work out solutions. They can solve some logic or mathematical problems, but their inability to plan their responses means you have to guide them to the solution when setting out the problem.
Programmers have recently begun to write software connecting LLMs to the internet, creating software agents that use the power of LLMs to interact with the online world, creating and achieving goals. At the moment these seem rather trivial – there is one called BabyAGI – but the mere idea of them might be enough to lead someone who has spent their entire working life making these things possible to wonder whether they had done the right thing.