Companies like OpenAI and Midjourney build chatbots, image generators and other artificial intelligence tools that work in the digital world.
Now, a startup founded by three former OpenAI researchers is using the technology development methods behind chatbots to create AI technology that can navigate the physical world.
Covariant, a robotics company headquartered in Emeryville, California, is developing ways for robots to pick, move and sort items as they are shipped through warehouses and distribution centers. Its purpose is to help robots understand what’s going on around them and decide what they should do next.
The technology also gives the robot a broad understanding of the English language, allowing people to converse with them as if they were chatting with ChatGPT.
The technology, still under development, is not perfect. But it’s a clear sign that the artificial intelligence systems that power online chatbots and image generators will also power machines in warehouses, streets and homes.
Like chatbots and image generators, this robotics technology learns its skills by analyzing large amounts of digital data. This means engineers can improve the technology by feeding it more data.
Covariant, which is backed by $222 million in funding, doesn’t make robots. It creates the software that powers the robot. The company aims to deploy its new technology with warehouse robots, providing a roadmap for others to do the same in manufacturing plants and perhaps on the streets with driverless cars.
The AI systems that drive chatbots and image generators are called neural networks, named for the web of neurons in the brain.
By spotting patterns in vast amounts of data, these systems can learn to recognize words, sounds and images — or even create them themselves. That’s how OpenAI built ChatGPT, giving it the power to instantly answer questions, write term papers and generate computer programs. He learned this skill from text scraped all over the internet. (Several media outlets, including The New York Times, have sued OpenAI for copyright infringement.)
Companies are now building systems that can learn from different types of data at the same time. By analyzing both a collection of images and the captions describing those images, for example, a system can understand relationships between the two. It may learn that the word “banana” describes a curved yellow fruit.
OpenAI used this system to build its new video generator, Sora. By analyzing thousands of captioned videos, the system learned to create a video when given a brief description of a scene, such as “a beautifully rendered papercraft world of a coral reef, filled with colorful fish and sea creatures.” .
Covariant, founded by University of California, Berkeley professor Peter Abel, and three of his former students, Peter Chen, Rocky Duan and Tianao Zhang, used similar techniques to create a system that warehouses Operates the robot.
The company helps operate sorting robots in warehouses around the world. He has spent years collecting data from cameras and other sensors that show how these robots work.
“It digests all kinds of data that matter to robots — that can help them understand and interact with the physical world,” Dr. Chen said.
By combining this data with the vast amounts of text used to train chatbots like ChatGPT, the company has created AI technology that gives its robots a much broader understanding of the world around them. she does.
After recognizing patterns in this stew of images, sensory data and text, the technology empowers a robot to deal with unpredictable situations in the physical world. The robot knows how to pick up a banana, even if it has never seen a banana before.
It can also respond in plain English like a chatbot. If you tell it to “pick up the banana,” it knows what you mean. If you tell it to “pick up the yellow fruit”, it also understands.
It can also create videos that predict what is likely to happen when it tries to pick up a banana. These videos have no practical use in the warehouse, but they show the robot’s understanding of its surroundings.
“If it can predict the next frames in the video, it can identify the right strategy to follow,” Dr. Abel said.
A technology called RFM for the Robotics Foundation model makes mistakes, just like chatbots do. Although it often understands what people ask about it, there’s always a chance it won’t. It drops items from time to time.
Gary Marks, an AI entrepreneur and professor emeritus of psychology and neural science at New York University, said the technology could be useful in warehouses and other situations where mistakes are acceptable. But he said it would be more difficult and dangerous to deploy in manufacturing plants and other potentially dangerous situations.
“It comes at the cost of error,” he said. “If you have a 150-pound robot that can do something harmful, it might cost more.”
As companies train this type of system on increasingly large and diverse collections of data, researchers believe it will improve rapidly.
This is very different from the way robots have worked in the past. Typically, engineers programmed robots to perform the same exact motion over and over again — like picking up a box of a certain size or attaching a rivet to a certain spot on a car’s rear bumper. But robots could not deal with unpredictable or random situations.
By learning from digital data—hundreds of thousands of examples of what happens in the physical world—robots can begin to handle unexpected situations. And when these examples are combined with language, robots can also respond to text and voice suggestions, as a chatbot would.
This means robots will become more nimble, as will chatbots and image generators.
What’s in the digital data can be transferred to the real world,” said Dr. Chen.