Peter Chen, CEO of robot software company Covariant, sits in front of a chatbot interface used to interact with ChatGPT. “Show me the bust in front of you,” he types. In response, a video feed appears, showing the robot’s arm hovering over a box containing various objects—a pair of socks, a tube of chips, and an apple between them.
A chatbot can discuss—but also manipulate—the things it sees. When WIRED suggests Chen ask him to grab a piece of fruit, Arm reaches down, gently grabs the apple, and then moves it to another box nearby.
This hands-on chatbot is a step toward the general and flexible capabilities demonstrated by programs like ChatGPT. The hope is that AI will eventually overcome the longstanding difficulty of programming robots and make them do more than a narrow set of tasks.
“At this point it’s not at all controversial to say that foundational models are the future of robotics,” says Chen, using a term for large-scale, general-purpose machine learning models developed for a specific domain. The handy chatbot showed me is powered by a model developed by Covariant called RFM-1 for the Robot Foundation model. Like the guys behind ChatGPT, Google’s Gemini, and other chatbots, it’s trained with large amounts of text, but with tens of millions of examples of the robot’s movements from physical exertion. Video and hardware control and motion data are also provided. World
The additional data involved create a model not only of fluency in language but also of action, and it is able to link the two. RFM-1 can not only chat and control the robot arm, but also make videos showing the robot doing various tasks. When prompted, the RFM-1 will demonstrate how a robot should randomly grasp an object. “It can take all these different methods that are important to robotics, and it can output any of them,” Chen says. “It’s a little mind-blowing.”
The model has also shown that it can learn to control similar hardware not in its training data. With more training, that could mean the same general model could drive a humanoid robot, said Peter Abel, co-founder and chief scientist at Covariant, which pioneered robot learning. In 2010 he led a project that trained a robot to fold towels — albeit slowly — and he also worked at OpenAI before ceasing to do robot research.
Covariant, which was founded in 2017, currently sells software that uses machine learning to allow robotic arms to pick items from bins in warehouses, but is typically limited to that task. They are training. Abel says models like the RFM-1 could allow robots to adapt to new tasks more fluidly. He compares Covariant’s strategy to how Tesla uses data from cars it’s sold to train its self-driving algorithms. “That’s the only thing we’re playing here,” he says.
Abel and his smooth colleagues are far from the only roboticists who hope that the capabilities of the large language models behind ChatGPT and similar programs can revolutionize robotics. Projects such as RFM-1 have shown promising initial results. But how much data might be needed to train models that create robots with more general abilities—and how to collect it—is an open question.