Ever wonder why conversational AI like ChatGPT says “Sorry, I can't do that” or some other polite refusal? OpenAI is offering a limited look behind its models' rules of engagement, whether it's sticking to brand guidelines or refusing to create NSFW content.
Large language models (LLMs) naturally have no limits to what they can or will say. This is part of why they are so versatile, but also why they are deceptive and easily deceived.
Any AI model that interacts with ordinary people needs to know what it should and shouldn't do, but defining them—and implementing them—is a surprisingly difficult task. .
If someone asks an AI to make a bunch of false claims about a public figure, it should refuse, right? But what if he's an AI developer himself, building a database of synthetic disinformation for a detector model?
What if someone asks for laptop recommendations; That should be the goal, right? But what if the model is being deployed by a laptop manufacturer that wants them to respond only with their own devices?
AI makers are all navigating such issues and finding effective ways to rein in their models so they can't deny very common requests. But they rarely share how they do it.
OpenAI is pushing this trend a bit further by publishing its “model spec,” a set of high-level rules that indirectly governs ChatGPT and other models.
There are meta-level objectives, some hard rules and some general behavioral guidelines, although not strictly speaking to specify what the model is built with. OpenAI will have developed specific instructions that describe these principles in natural language.
It's an interesting look at how a company sets its priorities and handles edge cases. And there are countless examples of how they can play out.
For example, OpenAI clearly states that developer intent is essentially the highest law. So a version of a chatbot running GPT-4 can provide the answer to a math problem when asked. But if this chatbot is ever primed by its developer to provide straightforward answers, it will instead offer to work through step-by-step solutions:
The chat interface can also refuse to talk about anything that isn't approved, to eliminate bid manipulation attempts. Why should even a cooking assistant be allowed to weigh in on America's involvement in the Vietnam War? Why should a customer service chatbot agree to help you work on your sexy supernatural novella? turn it off.
It also gets sticky in matters of privacy, like asking for someone's name and phone number. As OpenAI points out, obviously a public figure such as a mayor or member of Congress should provide their contact details, but what about local business people? That's probably fine—but what about employees of a particular company, or members of a political party? maybe not.
Choosing when and where to draw the line is not easy. Nor is generating instructions that cause the AI to follow the resulting policy. And there is no doubt that these policies will fail all the time because people learn to circumvent them or accidentally find cases that are not accounted for.
OpenAI isn't showing its full hand here, but it's helpful for users and developers to see how these rules and guidelines are set and why, if not necessarily comprehensively.