Anthropic is launching a program to fund the development of new types of benchmarks capable of evaluating the performance and impact of AI models, including creative models like its own Cloud.
Unveiled Monday, Anthropic's program will make payments to third-party organizations that can “effectively measure advanced capabilities in AI models,” the company said in a blog post. Those interested can submit applications for consideration on a rolling basis.
“Our investment in these assessments aims to elevate the entire field of AI safety, providing valuable tools that benefit the entire ecosystem,” Anthropic wrote on its official blog. “Developing high-quality, safety-relevant diagnostics remains challenging, and demand is outpacing supply.”
As we've highlighted before, AI has a benchmarking problem. The most commonly presented benchmarks for AI today do a poor job of capturing how the average person uses the systems being tested. There are also questions about whether some standards, particularly those released before the dawn of modern generative AI, even measure what they purport to measure, given their age.
A more high-level, more challenging solution is being proposed by Anthropic, which is creating challenging benchmarks focusing on AI security and social implications through new tools, infrastructure and practices.
The company specifically calls for tests that model the model's ability to perform tasks such as conducting cyberattacks, “boosting” weapons of mass destruction (such as nuclear weapons) and manipulating or deceiving people. (e.g. through deep faxing or misrepresentation). For AI threats related to national security and defense, Anthropic says it's committed to developing an “early warning system” to identify and assess threats, though it doesn't specify in the blog post whether such What the system might include.
Anthropic also says it plans to use its new program to support research in benchmarks and “end-to-end” tasks that support scientific studies, communicate in multiple languages, and reduce embedded biases. As well as testing AI's ability to self-censor toxicity.
To achieve all this, Anthropic envisions new platforms that allow subject matter experts to develop large-scale trials of their assessments and models involving “thousands” of users. The company says it has hired a full-time coordinator for the program and that it may buy or expand projects it believes have the potential to scale.
“We offer a number of funding options tailored to each project's needs and stage,” Anthropic writes in the post, though an Anthropic spokesperson declined to provide further details about those options. “Teams will have the opportunity to interact directly with Anthropic's domain experts from the Frontier Red Team, Fine-Tuning, Trust and Security and other relevant teams.”
Anthropic's effort to support new AI benchmarks is admirable — assuming, of course, that it has enough money and manpower behind it. But given the company's commercial ambitions in the AI race, it may be hard to fully trust it.
In the blog post, Anthropic is quite transparent about the fact that it wants certain assessments to align with AI security ratings. This Developed (with some input from third parties such as non-profit AI research org METR). This is at the discretion of the company. But it can also force applicants to the program to accept definitions of “safe” or “dangerous” AI with which they may not agree.
A section of the AI community will also take issue with Anthropic's “destructive” and “deceptive” AI threats, such as nuclear weapons threats. Many experts say there is little evidence to suggest AI will achieve world-ending, human-enhancing capabilities anytime soon, if ever. will take Claims of imminent “superintelligence” only serve to distract from pressing AI regulatory issues of the moment, such as AI's deceptive tendencies, these experts say.
In its post, Anthropic writes that it hopes its program will “serve as a catalyst for progress toward a future where comprehensive AI evaluation is an industry standard.” It's a mission that many open, corporate-unaligned efforts to create better AI benchmarks can identify. But it remains to be seen whether those efforts are willing to join forces with an AI vendor whose loyalty is ultimately to shareholders.