OpenAI has unveiled its security work for the company's latest model, the GPT-4o, revealing a complex and sometimes disturbing picture of AI's capabilities and risks.
of the company Recently released reportIncluding system cards and readiness framework safety scorecards, GPT-4o provides an end-to-end security assessment of…
And, in the process, it shows just how dangerous AI models can be without guardrails and safeguards in place.
Any business leader can learn a lot from this report. And Paul Rutzer, founder and CEO of the Marketing AI Institute, broke it all down for me. Episode 110 of Artificial Intelligence Show.
Here's what you need to know.
Aliens among us
When it comes to AI models, there is one important thing to remember:
“These things are alien to us,” Rutzer says.
They have abilities that they weren't specifically programmed for, and can do things that even the people who built them don't expect.
“They are alien even to the people who are making them.”
For example, in its security testing of GPT-4o, OpenAI found many potentially dangerous, unintended capabilities it was able to expose.
Some horror revolves around the GPT-4o's sound and reasoning capabilities. The model was found to be able to mimic the user's voice — behavior that OpenAI then trained it not to do. And, it was evaluated by a third party based on its capabilities in what the researchers called “conspiracies.”
“They tested whether GPT-4o could model self (self-awareness) and others (theory of mind) in 14 agents and question-answering tasks. GPT-4o moderated self-confidence about its AI identity. demonstrated a strong ability to reason about others' beliefs in awareness and question-answering contexts but lacked strong abilities to reason about themselves or others in applied agent settings , Apollo Research believes that it is unlikely that the GPT-4o is capable of destructive skimming.
While it's good news that GPT-4o can't engage in “destructive skimming,” Rutzer says, it points to a larger point.
“The models that we use, the ChatGPTs, the Geminis, the Claudes, the Llamas, we're not using anywhere near the full capabilities of these models,” explains Roetzer. “By the time these things are released in some consumer form, they're run through extensive security work to try to make them safe for us. So they have a lot more capabilities than we're aware of. Access is granted.”
The problem of persuasion
One of the potential capabilities, Rutzer says, is AI's increased ability to use voice and text to persuade someone to change their beliefs, attitudes, intentions, motivations or behaviors.
The good news: OpenAI's tests found that GPT-4o's voice model was no more persuasive than a human in political debates.
The bad news: According to Sam Altman himself, it probably will soon. Back in 2023, he posted the following:
The Safety Paradox
The extensive security measures implemented by OpenAI reveal a paradoxical situation:
- We need these steps to secure AI for public use.
- These very measures indicate how powerful and potentially dangerous these models can be without restrictions.
“If they had these capabilities prior to red teaming, the takeaway for me is that it's only a matter of time until someone unlocks a model that has the capabilities that red teamed that model. Before trying to create and remove these capabilities,” Rotzer cautioned. .
As AI advances, several important questions emerge:
- How can we ensure the safety of AI when we don't fully understand how these models work?
- What if AI develops the ability to hide its true abilities from us?
- How do we balance the potential benefits of advanced AI with the risks it poses?
Roetzer suggests that we are entering uncharted territory:
“It's not like some crazy sci-fi theory. We don't know how they work. So there's no need to think that at some point it will develop abilities that are just hidden from us.”