https://www.ft.com/content/37bb01af-ee46-4483-982f-ef3921436a50

Their paper paved the way for the rise of large language models. But all have since left the Silicon Valley giant

The eight research scientists working at Google who revolutionised AI by creating the ‘transformer’ model © FT montage

Like many breakthroughs in scientific discovery, the one that spurred an artificial intelligence revolution came from a moment of serendipity.

In early 2017, two Google research scientists, Ashish Vaswani and Jakob Uszkoreit, were in a hallway of the search giant’s Mountain View campus, discussing a new idea for how to improve machine translation, the AI technology behind Google Translate.

The AI researchers had been working with another colleague, Illia Polosukhin, on a concept they called “self-attention” that could radically speed up and augment how computers understand language.

Polosukhin, a science fiction fan from Kharkiv in Ukraine, believed self-attention was a bit like the alien language in the film Arrival, which had just recently been released. The extraterrestrials’ fictional language did not contain linear sequences of words. Instead, they generated entire sentences using a single symbol that represented an idea or a concept, which human linguists had to decode as a whole.

The cutting-edge AI translation methods at the time involved scanning each word in a sentence and translating it in turn, in a sequential process. The idea of self-attention was to read an entire sentence at once, analysing all its parts and not just individual words. You could then garner better context, and generate a translation in parallel.

The three Google scientists surmised this would be much faster and more accurate than existing methods. They started playing around with some early prototypes on English-German translations, and found it worked.

During their chat in the hallway, Uszkoreit and Vaswani were overheard by Noam Shazeer, a Google veteran who had joined the company back in 2000 when Google had roughly 200 employees.

Shazeer, who had helped build the “Did You Mean?” spellcheck function for Google Search, among several other AI innovations, was frustrated by existing language-generating methods, and looking for fresh ideas.

So when he heard his colleagues talking about this idea of “self-attention”, he decided to jump in and help. “I said, I’m with you . . . let’s do it, this is going to make life much, much better for all AI researchers,” Shazeer says.

The chance conversation formalised a months-long collaboration in 2017 that eventually produced an architecture for processing language, known simply as the “transformer”. The eight research scientists who eventually played a part in its creation described it in a short paper with a snappy title: “Attention Is All You Need.”

One of the authors, Llion Jones, who grew up in a tiny Welsh village, says the title was a nod to the Beatles song “All You Need Is Love”. The paper was first published in June 2017, and it kick-started an entirely new era of artificial intelligence: the rise of generative AI.

Today, the transformer underpins most cutting-edge applications of AI in development. Not only is it embedded in Google Search and Translate, for which it was originally invented, but it also powers all large language models, including those behind ChatGPT and Bard. It drives autocomplete on our mobile keyboards, and speech recognition by smart speakers.

Its real power, however, comes from the fact that it works in areas far beyond language. It can generate anything with repeating motifs or patterns, from images with tools such as Dall-E, Midjourney and Stable Diffusion, to computer code with generators like GitHub CoPilot, or even DNA.

Vaswani, who grew up in Oman in an Indian family, has a particular interest in music and wondered if the transformer could be used to generate it. He was amazed to discover it could generate classical piano music as well as the state-of-the-art AI models of the time.