Peripheral vision enables humans to see shapes that are not in our direct line of sight, albeit with less detail. This ability expands our field of vision and can be helpful in many situations, such as detecting a vehicle approaching our car.
Unlike humans, AI does not have peripheral vision. Equipping computer vision models with this ability could help them more effectively detect approaching hazards or predict whether a human driver will see something approaching.
Taking a step in this direction, MIT researchers have created an image dataset that they can use to simulate peripheral vision in machine learning models. They found that training models with this dataset improved the models’ ability to detect objects in the visual field, although the models still performed worse than humans.
Their results also revealed that unlike humans, neither the size of objects nor the amount of visual clutter in a scene had a strong impact on AI performance.
“There’s something fundamental going on here. We tested a lot of different models, and even when we train them, they get a little better but they’re not exactly human-like. So, the question is That is: What is missing in these models?” says Vasa du Taille, a postdoc and co-author of the paper detailing the research.
Answering this question could help researchers build machine learning models that can see the world like humans do. In addition to improving driver safety, such models can be used to create displays that are easier for people to see.
In addition, a deeper understanding of peripheral vision in AI models could help researchers better predict human behavior, adds lead author Ann Harrington Meng ’23.
“Modeling peripheral vision, if we can really capture the essence of what appears in the periphery, can help us understand the features in the visual scene that would trigger our eyes to gather more information. are,” she says.
His co-authors include Mark Hamilton, a graduate student in electrical engineering and computer science. Ayush Tiwari, a postdoc; Simon Stent, research manager at the Toyota Research Institute; and senior authors William T. Freeman, Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Ruth Rosenholtz, principal research scientist in the Department of Brain and Cognitive Sciences and a member of CSAIL. This research will be presented at the International Conference on Learning Representations.
“Anytime you have a human interacting with a machine — a car, a robot, a user interface — it’s critical to understand what that person can see. Peripheral vision plays an important role in that understanding. is,” says Rosenholtz.
Simulation of peripheral vision
Extend your arm in front of you and hold your thumb up — the small area around your thumbnail is visible from your fovea, the small depression in the center of your retina that provides the sharpest vision. Everything else you can see is in your visual field. Your visual cortex represents a scene with less detail and fidelity as it moves away from that sharp point of attention.
Many current approaches to model peripheral vision in AI represent this degraded detail by blurring the edges of images, but information loss in the optic nerve and visual cortex is far more complex.
For more accurate vision, MIT researchers began with techniques used to model peripheral vision in humans. Known as the texture tiling model, this method transforms images to represent the loss of human visual information.
They modified the model so that it can transform images in the same way, but in a more flexible way that doesn’t require knowing in advance where the person or AI will point their eyes.
“This allows us to model peripheral vision in a similar way to what is done in human vision research,” says Harrington.
The researchers used this modified technique to generate a large dataset of transformed images that appear more textural in certain areas, to represent the loss of detail that occurs when a human sphere I see more.
They then used the dataset to train several computer vision models and compared their performance to that of humans on an object detection task.
“We had to be very clever in setting up the experiment so that we could also test it in machine learning models. We didn’t want to retrain the models on a toy task that they weren’t meant to do,” she says.
Unique performance
Humans and models were shown pairs of altered images that were identical, except that one image contained a target object. Next, each participant was asked to take a picture with the target object.
“One thing that really surprised us was how good people were at detecting objects in their surroundings. We went through at least 10 different sets of images that were very easy. And there was a need to use smaller items.
The researchers found that training models from scratch with their dataset had the greatest performance boost, improving their ability to detect and recognize objects. Fine-tuning a model to its dataset, a process that involves tweaking a previously trained model so that it can perform a new task, results in small gains in performance.
But in every case, the machines weren’t as good as humans, and they were particularly bad at detecting distant objects. Their performance also did not follow human-like patterns.
“This may suggest that the models are not using context in the same way that humans are when performing these detection tasks. The models’ strategies may be different,” Harrington says.
The researchers plan to explore these differences, with the goal of finding a model that can predict human performance in the visual realm. It could enable AI systems that alert drivers to hazards they can’t see, for example. They also hope to encourage other researchers to conduct additional computer vision studies with their publicly available dataset.
“This work is important because it contributes to our understanding that human vision in the periphery should not be considered just poor vision due to the limited number of photoreceptors we have, but a representation that we have of real tasks. suitable for performing Despite their progress, they have not matched human performance in this regard, which is why more AI research should be done to learn from the neuroscience of human vision. This future research will be significantly aided by the database of images provided by the authors to simulate peripheral human vision.
This work is supported, in part, by the Toyota Research Institute and the MIT CSAIL METEOR Fellowship.