Why the OpenAI Super Alignment Team in Charge of AI Safety Got Stuck

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

For months, OpenAI has been losing employees who care deeply about making sure AI is safe. Now, the company is positively bleeding them.

Ilya Sutskever and Jan Leike announced their departure from OpenAI, makers of ChatGPT, on Tuesday. He was the leader of the company's Super Alignment Team – the team tasked with ensuring that AI stays aligned with the goals of its creators rather than acting unexpectedly and harming humanity.

They're not the only ones left. Since last November — when OpenAI's board tried to fire CEO Sam Altman only to see him quickly claw his way back into power — at least five more of the company's More security-conscious employees have either resigned or been pushed out.

What is going on here?

If you've been following the story on social media, you might think that OpenAI has secretly made a huge technological breakthrough. Meme “What did Illya see?” Speculation suggests that former chief scientist Sutskewer left because he saw something terrifying, such as an AI system that could destroy humanity.

But the real answer has to do less with pessimism about technology and more with pessimism about humans — and one human in particular: Altman. According to sources familiar with the company, security-minded employees have lost confidence in him.

“It's a process of disintegration of trust, like dominoes falling one by one,” a person with knowledge of the company told me on condition of anonymity.

Many employees are not willing to talk about it publicly. This is partly because OpenAI is known for signing off-boarding agreements with its workers without discrimination. If you refuse to sign one, you give up your equity in the company, which means you could potentially lose millions of dollars.

However, one former employee refused to sign an off-boarding agreement so that he would be free to criticize the company. Daniel Kokotajlo, who joined OpenAI in 2022 with hopes of leading it to the safe deployment of AI, served on the governance team — until he resigned last month.

“OpenAI is training ever more powerful AI systems with the goal of eventually surpassing human intelligence across the board. This could be the best thing that has ever happened to humanity, but if we proceed with caution “If we don't grow, it could be the worst,” Kokotajalu told me this week.

OpenAI says it wants to create artificial general intelligence (AGI), a hypothetical system that can perform at a human or superhuman level in many domains.

“I joined with a lot of hope that OpenAI would rise to the occasion and behave more responsibly as it got closer to achieving AGI. It gradually became clear to many of us that such Not going to happen,” Kokotajlo told me. “I gradually lost confidence in OpenAI's leadership and their ability to handle AGI responsibly, so I quit.”

And Leike, in a thread on X explaining why he stepped down as co-leader of the Super Alignment team, painted a very similar picture on Friday. “I've been at odds with OpenAI's leadership about the company's core priorities for a long time, until we finally reached a tipping point,” he wrote.

OpenAI did not respond to a request for comment in time for publication.

Why did OpenAI's security team mistrust Sam Altman?

To get over what happened, we have to rewind to last November. That's when Sutskever worked with the OpenAI board to try to fire Altman. Altman was “consistently unclear in his communications,” the board said. Translation: We don't trust him.

The eviction failed spectacularly. Altman and his ally, company president Greg Brockman, threatened to take OpenAI's top talent to Microsoft – effectively destroying OpenAI – unless Altman was reinstated. would have been Faced with this threat, the board gave up. Altman came back stronger than ever, with new, more supportive board members and a freer hand to run the company.

Things get weird when you shoot at the king and miss.

Publicly, Sutskever and Altman formed an enduring friendship. And when Sutskever announced Of his departure this week, he said he was leaving “to pursue a project that means a lot to me personally.” Altman Posted on X. Two minutes later, saying “It's very sad for me. Ilya… is a dear friend.”

Yet Suitscure hasn't been seen in the OpenAI office for nearly six months – since the attempted coup. He remotely co-leads the Super Alignment Team, tasked with ensuring that future AGI aligns with humanity's goals rather than going rogue. That's a nice enough aspiration, but one that's divorced from the day-to-day operations of the company, which is racing to commercialize products under Altman's leadership. And then there was this tweet, posted shortly after Altman's recovery and quickly deleted:

So, despite their public-facing friendship, there is reason to suspect that Sutscheuer and Altman were friends when the former tried to oust the latter.

And Altman's response to being fired revealed something about his character: his threat to shut down OpenAI unless the board rehired him, and to stack the board with new members. But his insistence showed his determination to stay in power and avoid the future. Checks on it. Former colleagues and employees describe him as a manipulator who talks out of both sides of his mouth — someone who claims, for example, that he wants to prioritize safety, but It contradicts him in his behavior.

For example, Altman was fundraising with authoritarian regimes like Saudi Arabia to build a new AI chip company, which would provide him with a large supply of the resources he needed to create sophisticated AI. This was a concern for safety-minded employees. If Altman was really concerned about building and deploying AI in the safest way possible, why was he frantically trying to collect more and more chips, which would only accelerate the technology? ? For that matter, why was he risking working with governments that might use AI to supercharge digital surveillance or human rights abuses?

For employees, this all adds up to a “lack of confidence that when OpenAI says it's going to do something or says it values ​​something, it's actually true,” the company said. A source with inside knowledge told me.

That gradual process escalated this week.

John Lake, co-leader of the Super Alignment team, didn't bother to play nice. “I resigned” He posted on X., hours after Suitscure announced its departure. No warm goodbyes. There is no vote of confidence in the leadership of the company.

Other security-minded former employees tweeted, including heart emojis, referring to Leake's blunt resignation. was one of them. Leopold Scheinbrenner, a Sutskever ally and member of the superalignment team who was fired from OpenAI last month. Media reports indicated that he and another researcher from the same team, Pavel Izmailov, were fired for allegedly leaking information. But OpenAI has not produced any evidence of a leak. And given the strict confidentiality agreement everyone signs when they first join OpenAI, Altman — a deeply networked Silicon Valley veteran adept at working with the press — for It would be easy for them to share even the most damaging information as a “leak”. If he wanted to get rid of Suitscure's allies.

That same month Aschenbrenner and Izmailov were forced out, another security researcher, Cullen O'Keefe, also left the company.

And two weeks ago, another security researcher, William Saunders, wrote a cryptic post on the EA Forum, an online gathering place for members of the Effective Altruism movement, who are heavily involved in the cause of AI security. Saunders summarized the work he has done as part of the SuperAlignment team at OpenAI. Then he wrote: “I resigned from OpenAI on February 15, 2024.” One commenter asked the obvious question: Why was Saunders posting this?

“No comment,” Saunders replied. Commentators concluded that he was probably bound by a non-disparagement agreement.

Putting it all together with my conversations with company insiders, what we get is a picture of at least seven people who tried to push OpenAI from the inside to be more secure, but ultimately had lost so much faith in their charismatic leader that their position became untenable.

“I think a lot of people in companies who take safety and social impact seriously think of it as an open question: Is it a good job to work for a company like OpenAI?” A person with inside knowledge of the company said. “And the answer is 'yes' only to the extent that OpenAI is going to be really thoughtful and responsible about what it's doing.”

After the security team is gone, who will ensure that OpenAI's work is secure?

Since Leike is no longer around to run the SuperAlignment team, OpenAI has replaced him with company co-founder John Shulman.

But the team has been hollowed out. And Schulman is already full with his current full-time job ensuring the security of OpenAI's existing products. How much serious, future-proofing work can we expect in OpenAI?

Maybe not much.

“The whole point of setting up the super-alignment team was that there would actually be a variety of security issues if the company succeeded in creating AGI,” the person with insider knowledge told me. “So, it was a dedicated investment in that future.”

Even when the team was operating at full capacity, that “dedicated investment” was home to a small fraction of OpenAI's researchers and promised only 20 percent of its computing power—perhaps all of the AI ​​company's. Important resource from Now, that computing power may be outsourced to other open AI teams, and it's unclear whether future AI models will focus more on avoiding catastrophic risk.

Clearly, this doesn't mean that the products OpenAI is releasing now — like a new version of ChatGPT, dubbed GPT-4o, which can converse with users in a natural voice — destroy humanity. are going to do But what's coming down the pike?

“It's important to distinguish between 'Are they currently building and deploying AI systems that are insecure?' vs. 'Are they on the way to safely building and deploying AGI or superintelligence?'” the insider said. “I think the answer to the second question is no.”

Like expressed the same concern in his Friday thread on X. He noted that his team was struggling to get enough computing power to do their work and was generally “sailing against the wind.”

Most strikingly, Leike said, “I believe a lot of our bandwidth should be spent getting ready for the next generation of models, security, surveillance, preparedness, security, adversarial robustness, (super ) alignment, privacy, social impact, and related topics. These issues are hard enough to fix, and I'm concerned that we're not on track to get there.

When one of the world's leading minds in AI safety says that the world's leading AI company isn't on the right track, we all have reason to be concerned.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment