Updated April 13: Article originally posted April 10.
It might be the latest buzzword on Android, but Apple’s iPhone isn’t seen as an AI-powered smartphone. That’s about to change, and now we know a way Tim Cook and his team plan to catch up.
The details are revealed in a newly released research paper by Cornell University researchers working with Apple. Entitled “Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs”, it details a multimodal large language model that can be used to understand what is displayed on the screen, specifically On mobile user interface elements, such as the iPhone display
Thanks to a large supply of training data, picking icons, searching text, parsing widgets, describing on-screen text, parsing interface elements, and interacting with the display are guided by open instructions. It is possible. indicates
Ferret was released in October 2023 and was designed to parse pictures and images to recognize what was on show. Titled Ferret-UI, the upgrade will bring many benefits to those who use it with their iPhone and can easily fit into an improved AI-powered Siri.
Being able to describe the screen, regardless of the app, opens up a rich avenue for accessibility apps, and removes the need for pre-programmed responses and actions. People who want to perform complex tasks or are looking for obscure options on their phone can ask Siri to open a complex app and use an obscure function hidden deep in the menu system.
Developers can use Ferret-UI as a testing tool, asking MLMM to act like a 14-year-old with little experience with social networks to perform tasks. Or trying to connect a 75-year-old user on Facetime. their grandchildren.
Update: Saturday April 13: With an academic paper pointing to an AI upgrade for Siri, the back-end code Discovered by Nicholas Alvarez. Points to new server-side tools for individual iPhones.
The features are labeled “Safari Browsing Assistant” and “Encrypted Visual Search”. Both of these correlate with characteristics described in ferret-UI research, although these findings come with some caveats. This is server-side code, so changing these properties to use different code would be straightforward. They can stick to more prosaic code rather than using AI. Or they may be placeholders that may or may not be used in future products.
It’s worth noting that visual search has been addressed in the code for Vision OS and Apple Vision Pro headsets, but the feature has yet to be launched.
While these aren’t strong indicators of Apple’s AI path per se, they’re part of a growing body of evidence on Apple’s approach.
Google publicly kicked off the rush for AI-first smartphones on October 4, a little more than three weeks after the iPhone 15 launched. Photo processing or text autocorrection, giving Android a headstart on AI and allowing Google’s mobile platform to set expectations.
Apple’s Worldwide Developers Conference is taking place in June, and it will be the first time Apple can engage with the public to discuss its AI plans as it prepares for the launch of the iPhone 16 and iPhone 16 Pro in September. lays the foundation.
Until then, we have the cognitive side of Apple’s AI approach.
Now read why iPhone’s approach to AI is disrupting the iPhone 16 and iPhone 16 Plus specs…
follow me Twitter or LinkedIn. check this out my website.