Hackers 'jailbreak' powerful AI models in global effort to uncover flaws

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Pliny the Prompter says that it usually takes him 30 minutes to crack the world's most powerful artificial intelligence models.

A pseudonymous hacker has manipulated Meta's Llama 3 into sharing instructions for making napalm. It made Elon Musk's grok about Adolf Hitler a gush. Its own hacked version of OpenAI's latest GPT-4o model, dubbed “Godmode GPT,” was banned by the startup after being tipped off about illegal activities.

Pliny told the Financial Times that his “jailbreaking” was not nefarious but part of an international effort to expose the flaws of big language models to the public by tech companies in search of big profits.

“I've been on this warpath to bring awareness to the true capabilities of these models,” said Pliny, a crypto and stock trader who shares his jailbreak on X. own right . . At the end of the day I'm working for him. [the model owners] for free.”

Pliny is one of dozens of hackers, academic researchers and cybersecurity experts racing to find vulnerabilities in the fledgling LLMs, for example hints at getting around “guard rails” by tricking chatbots. through which AI companies have established in an effort to ensure the safety of their products. .

These ethical “white hat” hackers have often found ways to get AI models to create malicious content, spread misinformation, share private data or generate malicious code.

Companies like OpenAI, Meta and Google already use “red teams” of hackers to test their models before they are widely released. But the technology's weaknesses have created a growing market of LLM security startups that develop tools to protect companies planning to use AI models. According to data provider CB Insights, machine learning security startups raised $213 million in 23 deals in 2023, up from $70 million last year.

“The jailbreaking scene started about a year ago, and the number of attacks has continued to increase,” said Iran, principal threat researcher at CyberArk, which now offers LLM Security. Shimoni said. “It's a constant game of cat and mouse, with vendors improving the security of our LLMs, but also with attackers becoming more sophisticated in their tactics.”

The efforts come as global regulators look to step in to curb potential risks surrounding AI models. The European Union has passed the AI ​​Act, which creates new obligations for LLM providers, while the UK and Singapore are among countries considering new laws to regulate the sector.

The California Legislature will vote in August on a bill that would require the state's AI groups — including Meta, Google and OpenAI — to ensure they develop models with “an effective capability.” Don't do it.

“Everyone [AI models] would satisfy this criterion,” said Pliny.

Meanwhile, manipulated LLMs with names like WormGPT and FraudGPT have been sold by malicious hackers on the dark web for as little as $90 to write malware to aid in cyber attacks or Scammers can be helped to create automated yet highly personalized phishing campaigns. According to AI security group SlashNext, other variants have emerged, such as EscapeGPT, BadGPT, DarkGPT and Black Hat GPT.

Some hackers use the “uncensored” open source model. For others, jailbreaking attacks—or getting around security measures built into existing LLMs—represented a new skill, with criminals often sharing tips in communities on social media platforms like Reddit or Discord. are

Approaches range from individual hackers using synonyms of words that are blocked by the model's creators, to more sophisticated attacks that use AI to automate hacking.

Last year, researchers at Carnegie Mellon University and the US Center for AI Safety said they found a way to systematically jailbreak LLMs such as OpenAI's ChatGPT, Google's Gemini and an older version of the Anthropics Cloud—“closed” proprietary models that were supposedly less at risk. The attack researchers added that “it is unclear whether such behavior can ever be fully patched by LLM providers”.

Anthropic published research in April on a technique called “mini-shot jailbreaking,” in which hackers can motivate an LLM by showing it a long list of questions and answers and then modeling the same pattern of malicious questions. can be encouraged to respond. This attack is enabled by the fact that models developed by Anthropic now have a large context window, or place to add text.

“Although current state-of-the-art LLMs are powerful, we don't think they pose truly catastrophic risks yet. Future models may,” Anthropic wrote. “This means that now is the time to work to mitigate potential LLM jailbreaks before they can be used on models that could cause serious damage.”

Some AI developers said many of the attacks are fairly benign for now. But others warned about certain types of attacks that could lead to data exfiltration, whereby bad actors could find ways to extract sensitive information, such as the data on which the model was trained.

DeepCap, an Israeli LLM security group, found ways to force Llama 2, an older Meta AI model that is open source, to leak users' personally identifiable information. Ronnie Ohion, chief executive of DeepCap, said his company is developing specific LLM security tools, such as firewalls, to protect users.

“Releasing open source models widely shares the benefits of AI and helps more researchers identify and address vulnerabilities, so companies can make models more secure,” Meta said in a statement. ,” Metta said in a statement.

It added that it has conducted security stress tests with internal and external experts on its latest Llama 3 model and its chatbot Meta AI.

OpenAI and Google said they are continuously training the models to better defend against exploits and hostile behavior. Anthropic, which experts say has some of the most advanced efforts in AI security, has pushed for more information sharing and research into these types of attacks.

Despite the assurances, any risks will only increase as the models become more integrated with existing technology and devices, experts said. This month, Apple announced that it has partnered with OpenAI to integrate ChatGPT into its devices as part of a new “Apple Intelligence” system.

Ohioan said: “In general, companies are not ready.”

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment