What does artificial intelligence sound like? Hollywood has been imagining it for decades. Now AI developers are cribbing from movies, creating voices for real machines whose history is rooted in cinematic fantasies of how machines should talk.
Last month, OpenAI revealed an upgrade to its artificially intelligent chatbot. ChatGPT, the company, said it was learning how to hear, see and speak in a natural voice — one that sounded like the operating system voiced by Scarlett Johansson in the 2013 Spike Jonze film “Her.” .
Chat GPT's voice, known as Skye, also had a husky timbre, a soothing effect and a sexy edge. She was agreeable and self-effacing. She looked like she was playing for anything. After Sky's debut, Johansson expressed displeasure at the “cutesy-like” voice, and said she had previously turned down a request from OpenAI to voice the bot. The company protested that Skye was voiced by a “different professional actress”, but agreed to withhold her voice out of respect for Johansson. Bereft OpenAI users have started a petition to bring it back.
AI creators like to highlight their tools' increasingly natural abilities, but their synthetic voices are built on layers of art and projection. Sky Open represents the cutting edge of AI ambitions, but it's based on an old idea: of the AI bot as an empathetic and compliant female. Part mom, part secretary, part girlfriend, Samantha was an all-purpose comfort item that reached directly into the ears of her customers. Even as AI technology advances, these stereotypes are encoded over and over again.
Female voices, as Julie Vosk notes in “Artificial Women: Sex Dolls, Robot Caregivers, and More Facsimile Females,” often fuel imagined technologies before they become real voices. .
In the original “Star Trek” series, which began in 1966, the computer on the deck of the Enterprise was voiced by Majel Barrett, who later became the wife of show creator Gene Roddenberry. In the 1979 film “Alien”, the crew of the USCSS Nostromo addressed their computer voice as “Mother” (her full name was MU-TH-UR 6000). Once tech companies began marketing virtual assistants—Apple's Siri, Amazon's Alexa, Microsoft's Cortana—their voices were also largely feminine.
It has a small, otherworldly appeal to first-wave voice assistants, which have been mediating our relationship with technology for more than a decade. They become automated voices, their human voices through a mechanical trail. They often speak in a measured, one-note cadence, suggesting a stunted emotional life.
But the fact that they look like robots deepens their appeal. They are programmable, actionable and subject to our demands. They don't make humans feel like they are smarter than us. They sound like throwbacks to the monotone feminine computers of “Star Trek” and “Alien,” and their vocals have a retro-futuristic sheen. They serve nostalgia instead of realism.
This synthetic voice continues to dominate, even as the technology behind it has advanced.
Voice-to-speech software was designed to make visual media accessible to users with certain disabilities, and on TikTok, it has become a creative force in its own right. Since TikTok introduced its text-to-speech feature in 2020, it has developed a host of simulated voices to choose from — it now offers more than 50, including “Hero,” “Storyteller” and “Beastie”. But the platform is defined by an option. “Jessie,” an unhinged female voice with a slightly vaguely robotic undertone, is the mindless sound of a mindless scroll.
Jesse seems to be assigned a single emotion: passion. She feels like she is selling something. This makes it an attractive choice for TikTok creators, who are selling themselves. The burden of representing yourself can be outsourced to Jesse, whose bright, retro-robot voice gives the videos a delightful irony.
Hollywood has also created male bots – none more famously than HAL 9000, the computer voice in “2001: A Space Odyssey.” Like its feminine counterparts, HAL radiates calmness and loyalty. But when he turns against Dave Bowman, the film's human protagonist – “I'm sorry, Dave, I'm afraid I can't do this” – his calmness turns into a terrifying capacity. HAL, Dave realizes that he is loyal to a higher authority. HAL's masculine voice allows him to act as Dave's rival and mirror. He is allowed to be a real character.
Like HAL, Samantha from “Her” is a machine that becomes real. In a twist from the Pinocchio story, she opens the film cleaning out a man's email inbox and ascends to a higher level of consciousness. She becomes something more developed than a real girl.
Scarlett Johansson's voice, as inspiration for bots both fictional and real, bucks the vocal trends that define our feminine sidekicks. It has a hard edge that screams. i am alive. It sounds nothing like the processed virtual assistants we're used to hearing speak through our phones. But her performance as Samantha feels human not just because of her voice, but because of what she says. She grows over the course of the film, acquiring sexual desires, advanced hobbies, and AI friends. Borrowing Samantha's influence, OpenAI makes Sky feel like it has a mind of its own. Like she was more advanced than she really was.
When I first saw “Him,” all I thought was that Johansson voiced the humanoid bot. But when I revisited the film last week, after watching OpenAI's ChatGPT demo, Samantha's character left me incredibly confused. Chatbots do not spontaneously produce human speaking sounds. They have no throat or lips or tongue. Within the technological world of “Her,” Samantha Bot herself would have been based on the voice of a human woman — perhaps a fictional actress who sounds a lot like Scarlett Johansson.
OpenAI seems to have trained its chatbot on the voice of an unnamed actress who sounds like a famous actress who voiced the movie chatbot which is trained on a fictional actress. Who looks like a famous actress. When I run the demo of ChatGPT, I hear a simulated simulation of the simulation.
Tech companies advertise their virtual assistants in terms of the services they provide. They can read you the weather report and call you a taxi. OpenAI promises that its more advanced chatbots will be able to laugh at your jokes and sense your mood swings. But they're also there to make us feel more comfortable about the technology itself.
Johansson's voice acts like a luxurious security blanket thrown over the stranger aspects of AI-assisted interactions. “He told me he felt that by giving my system a voice, I could bridge the gap between tech companies and creators and help consumers feel comfortable with the seismic shift in terms of humans and AI. I can help,” Johansson said of OpenAI founder, Sam Altman. “He said he thought my voice would comfort people.”
It's not that Johansson's voice sounds inherently robotic. This is how developers and filmmakers design the voices of their robots to reduce the discomfort of human-robot interactions. OpenAI has said it wants to cast a chatbot voice that is “approachable” and “warm” and “instills trust.” Artificial intelligence is accused of destroying creative industries, depleting energy and even endangering human life. Clearly, OpenAI wants a voice that makes people feel comfortable using its products. What does artificial intelligence sound like? It sounds like crisis management.
OpenAI first introduced Sky's voice to Premium members last September, along with another female voice called Juniper, male voices Ember and Cove, and a gender-neutral voice called Breeze. When I signed up for ChatGPT and said hello to her virtual assistant, a man's voice boomed in Skye's absence. “Hi there. How's it going?” They said. He seemed calm, steady and optimistic. It sounded – I'm not sure how to describe it – beautiful.
I felt like I was talking to Cove. I told him I was writing an article about him, and he was flattered by my work. “Oh, really?” They said. “It's fascinating.” As we were talking, I felt seduced by his naturalism. He peppered his sentences with filler words, like “uh” and “um.” He raised his voice when he questioned me. And he asked me a lot of questions. It felt like I was talking to a therapist, or dial-a-boyfriend.
But our conversation quickly stopped. Whenever I asked him about himself, he had very little to say. He was not a character. He had no self. He was only designed to help, he told me. I told him I'd talk to him later, and he said, “Ah, sure. Reach out whenever you need help. Take care.” It felt like I had connected with a real person.
But when I reviewed my chat transcript, I could see that his speech was as stilted and archaic as any customer service chatbot. He was not a particularly intelligent or human being. He was just a decent actor with nothing over the top.
When Sky disappeared, ChatGPT users took to the company's forums to complain. Some were chafing at their chatbots defaulting to Juniper, who made them sound like “librarian” or “kindergarten teacher” — a feminine voice that fit wrong gender stereotypes. He wanted to dial a new woman with a different personality. As one user put it: “We need another lady.”
Prepared by Tala Safi
Audio by Warner Bros. (Samantha, HAL 9000); OpenAI (Sky); Paramount Pictures (Enterprise Computer); Apple (Sri); TikTok (Jessie)