Inside Listen AI, the startup is building ChatGPT for music.

I’m a soul trapped in this cycle.” The voice that sings these lyrics is raw and plaintive, drowning in blue notes. An acoustic guitar chugs behind it, punctuating the vocal phrases with tasteful runs. goes. But there’s no human behind the sound, no hands on that guitar. In fact, there’s no guitar at all. In the space of 15 seconds, this authentic, dynamic, blues song is generated by a startup’s state-of-the-art AI model. All it took was a simple text prompt to summon it from the void: “Solo acoustic Mississippi Delta blues about a sad AI.” To be more precise, the song is the work of two AI models: Snow’s model; creates all the music himself, while calling on OpenAI’s ChatGPT to create the lyrics and even a title: “Soul of the Machine.”

Online, Snow’s creations are starting to generate reactions like “How real is this?” As this particular track plays over Sonos speakers in a conference room at Snow’s temporary headquarters, steps away from Harvard’s campus in Cambridge, Massachusetts, even some of the people behind the technology are a little uneasy. There are mutters of “holy shit” and “oh, boy” as well as some nervous laughter. It’s mid-February, and we’ve been playing with their newest model, the V3, which is only a few weeks away from its public release. In this case, it took only three attempts to achieve this startling result. The first two were decent, but a simple tweak I quickly made — co-founder Kenan Freiberg suggested adding the word “Mississippi” — resulted in something a little more unusual.

Over the past year alone, generative AI has made great strides in generating reliable text, images (via services like Midjourney), and even video, most notably with OpenAI’s new Sora tool. But audio, and music in particular, lags behind. Snow AI seems to be cracking the code for music, and its founders’ ambitions are almost limitless – they envision a world in which music is democratized without limit. The most vocal of the co-founders, Mickey Schulman, a boyish, 37-year-old with a Harvard Ph.D. In physics, imagine a billion people around the world who pay 10 bucks a month to create songs with Snow. Given the fact that music listeners currently outnumber music creators, he argues, Snow is poised to redress this perceived imbalance.

Most AI-generated art so far is, at best, kitsch, à la the hyperrealistic sci-fi junk, heavy on the form-fitting spacesuits that many Mid Journey users aspire to create. are But “Soul of the Machine” feels something different — the most powerful and disturbing AI creation I’ve ever encountered in any medium. Its existence feels like a rift in reality, at once terrifying and vaguely unholy, and I keep thinking of Arthur C. Clarke’s quote made for the creative-AI era: ” No sufficiently advanced technology is divorced from magic.” A few weeks after returning from Cambridge, I sent the song to Living Color guitarist Vernon Reed, who has been outspoken about the dangers and possibilities of AI music. He notes his “surprise, shock, horror” at the song’s “disturbing reality”. “The long-held dystopian ideal of severing difficult, dirty, unwanted and despicable humanity from its creative output is at hand,” he writes, pointing to the difficult nature of an AI singing the blues, “an African American idiom, is deeply tied. to historical human trauma, and slavery.”

Suno is barely two years old. Co-founders Shulman, Freyberg, Georg Kucsko, and Martin Camacho, all machine learning experts, worked together until 2022 at another Cambridge company, Kensho Technologies, which focused on finding AI solutions to complex business problems. Shulman and Camacho are both musicians who used to jam together during their Kenshu days. At Kenshu, the four worked on transcription technology to capture the earnings calls of public companies, a difficult task given the audio quality, abundant jargon and the combination of different accents.

Along the way, Shulman and his colleagues fell in love with the unexplored possibilities of AI audio. In AI research, he says, “audio typically lags far behind images and text. There’s a lot we learn from the text community and how these models work and how they scale.”

Those same interests could have taken Suno’s founders to a very different place. Although he always intended to end up with a music product, his early brainstorming included the idea of ​​a hearing aid and the possibility of finding malfunctioning machinery through audio analysis. Instead, their first release was a text-to-speech program called Bark. When they surveyed Barak’s early users, it became clear that what they really wanted was a music generator. “So we started doing some preliminary experiments, and they looked promising,” Shulman says.

Snow uses the same general approach as larger language models like ChatGPT, which breaks down human language into discrete chunks called tokens, absorbs its millions of usages, styles and structures, and then uses it on demand. But audio, especially music, is almost unbelievably complex, which is why just last year, AI-music experts said rolling Stone It may take years to arrive at a service as capable as Suno. “Audio is not a discrete thing like words,” Shulman says. “It’s a wave. It’s a continuous signal.” The sampling rate for high-quality audio is typically 44khz or 48hz, which means “48,000 tokens per second,” he adds. “That’s a big problem, right? And so you need to figure out how to smooth it down to something more reasonable.” How, though? “A lot of work, a lot of research, a lot of other tricks and models and things like that. . I don’t think we’re anywhere close to being done. Ultimately, Snow wants to explore alternatives to the text-to-music interface, including more advanced and intuitive input — one idea is to create songs based on users’ own singing.

OpenAI has faced numerous lawsuits over the use of ChatGPT books, news articles, and other copyrighted material in its vast corpus of training data. Snow’s founders declined to give details of what data they’re incorporating into their model, other than the fact that its ability to reproduce convincing human voices comes in part because it’s music. Also learning from speech recordings. “Bare speech will help you learn features of the human voice that are difficult,” Shulman says.

One of Snow’s early investors is Antonio Rodriguez, a partner at venture capital firm Matrix. Rodriguez had only funded one previous music venture, music rating firm EchoNest, which Spotify bought to beef up its algorithm. With Snow, Rodriguez got involved before it was clear what the product would be. “I supported the team,” says Rodriguez, who exudes the confidence of a man who has made more than his share of successful bets. “I knew the team, and I especially knew Mikey, and so I would have backed him to do almost anything that was legal. He’s that creative.”

Rodriguez is investing in Snow with the full knowledge that music labels and publishers could sue, which he sees as “underwriting when we invested in the company, because we’re the big wallets.” Who would file lawsuits after these guys. . . . Frankly, if we had deals with labels after this company started, I probably wouldn’t have invested in it. It needed to be made into a product. (A spokesman for Universal Music Group, which has taken an aggressive stance on AI, did not return a request for comment.)

Snow says it’s in talks with major labels, and claims to respect artists and intellectual property — its tool won’t let you request a specific style of artists in your notation, and Does not use the voices of real artists. Many of Suno’s employees are musicians. The office has pianos and guitars on hand, and framed photos of classical musicians line the walls. The founders did not evidence any of the overt hostility to the music business that characterized, say, the Napster lawsuit that destroyed it. “That doesn’t mean we won’t sue, by the way,” Rodriguez added. “It just means we’re not going to behave like the police.”

Rodriguez sees Snow as a fundamentally capable and easy-to-use music tool, and believes it can make music accessible to everyone the way camera phones and Instagram have democratized photography. Made. The idea, he says, is to once again “restrict the number of people who are allowed to be creators of things on the Internet as opposed to consumers of things.” He and the founders venture to suggest that Snow could attract a larger user base than Spotify. If the prospect is hard to find, that’s a good thing, Rodriguez says: It just means that it’s “seemingly stupid” in exactly the way that attracts him as an investor. “All of our great companies have a pool of great talent,” he says, “and then something that just seems stupid until it becomes so obvious that it’s not stupid.”

Before the advent of Snow, musicians, producers, and songwriters were concerned about AI’s business-shaking potential. Reed writes, “Music, as played by men of unusual circumstances… who have suffered and struggled to advance their art, has to resist the wholesale automation of art so dearly bought. will have, which they have struggled to achieve,” Reid writes. But Suno’s founders claim there’s nothing to fear, using the metaphor that people read regardless of their ability to write. “The way we think about it is we’re trying to connect a billion people with music much more than they are now,” Shulman says. “If people are more into music, more focused on creating, more distinct tastes, that’s obviously good for artists. The vision we have for the future of music is One is where it’s artist-friendly. We’re not trying to replace artists.

While Snow is focused solely on reaching music fans who want to create songs for fun, this can be a significant roadblock. In the short term, the segment of the market that seems most directly at risk for human creators is a lucrative one: songs made for commercials and even TV shows. Lucas Keller, founder of management firm Milk & Honey, noted that the market for popular songs would not be affected. “But in terms of the rest of it, yes, it can definitely hurt their business,” he says. “I think ultimately, it allows a lot of ad agencies, movie studios, networks, etc. to not have to go to the licensing stuff.”

In the absence of stricter laws against AI-generated content, there’s also the possibility of a world where users of models flood streaming services like SnowKey with their robo-creations by the millions. “Spotify may one day say ‘You can’t do that,'” Shulman says, noting that until now Listen users have been more interested in texting their songs to just a few friends.

Snow currently only has 12 or so employees, but plans to expand with a much larger permanent headquarters under construction on the top floor of the same building as their current temporary office. As we tour the still-unfinished floor, Schulman shows off an area that will become a full-fledged recording studio. Listen, given what it can do, why do they need it? “It’s mostly a listening room,” he admits. “We want a good acoustic environment. But we all enjoy making music too – without AI.

Snow’s biggest potential rival so far seems to be Google’s DreamTrack, which has acquired licenses that allow users to create their own songs using famous voices like Charlie Puth through a similar prompt-based interface. give But DreamTrack has only been released to a small test user base, and the samples released so far aren’t as impressive as Snow, despite the famous sounds attached. “I don’t think the way Billy Joel makes new songs is the way people want to interact with music in the future with the help of AI,” Shulman says. “If I think about how we want people to make music in five years, it’s something that doesn’t exist. It’s something that’s in their head.”

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment