8 Google employees invented modern AI. Here’s the inside story

The last two weeks before the deadline were hectic. Although some of the team officially still had desks in the building in 1945, they mostly worked in 1965 because its micro kitchen had an improved espresso machine. “People weren’t sleeping,” says Gomez, who, as an intern, was in a constant frenzy of debugging and also creating concepts and sketches for the paper. Ablation is common in projects like this—taking things out to see what’s left is enough to work.

“There was every possible combination of tricks and modules — what helps, what doesn’t. Let’s rip it out. Let’s replace it with this,” says Gomez. “Why is the model behaving in this inconsistent way? Oh, it’s because we didn’t remember to do the masking properly. Does it still work? Okay, on to the next one. What we now call Transformers All of these components were the result of this very rapid, iterative trial and error. Shazier’s implementation-assisted elimination produced “something minimal,” says Jones. “Noam is a wizard. “

Vaswani recalls collapsing on the office couch one night while the team was writing a paper. As he looked up at the curtains separating the sofa from the rest of the room, he was struck by this pattern of fabric, which to him looked like synapses and neurons. Gomez was there, and Vaswani told him that what they were working on would go beyond machine translation. “Ultimately, like the human brain, you need to unify all these modalities—speech, audio, vision—under one architecture,” he says. “I strongly believed we were onto something more generic.”

However, in Google’s higher ranks, the work was seen as just another interesting AI project. I asked several Transformers if their owners ever asked them for project updates. not so much. But “we realized it was potentially a pretty big deal,” says Uszkoreit. “And because of that we actually draw a sentence toward the end of the paper, where we comment on future work.”

This phrase foreshadowed what might come next—essentially the application of transformer models to all forms of human expression. “We are excited about the future of attention-based models,” he wrote. “We plan to extend Transformer to problems related to input and output methods beyond text” and investigate “images, audio and video.”

A few nights before the deadline, Uszkoreit realized they needed a title. Jones noted that the team came down to rejecting accepted best practices, particularly LSTMs, for one technique: focus. The Beatles, Jones recalled, had a song called “All You Need Is Love.” Why not call the paper “Attention is all you need.”

The Beatles?

“I’m British,” says Jones. “It literally took five seconds of thought. I didn’t think they would use it.

He continued to collect results from his experiments until the deadline. “The English-French numbers, like five minutes before we submitted the paper,” says Parmar. “I was sitting in the Micro Kitchen in 1965, it came in last.” With barely two minutes left, they sent the paper.

Leave a Comment