While the tech industry tried to develop artificial intelligence, one giant held back: Apple. The company has yet to introduce an AI-generated emoji, and according to one The New York Times According to today’s report and earlier reporting by Bloomberg, it is in early talks with Google about incorporating the search company’s Gemini AI model into iPhones.
Yet a research paper quietly posted online last Friday by Apple engineers suggests the company is making significant new investments in AI that are already bearing fruit. It details the development of a new generative AI model called MM1 capable of working with text and images. Researchers show it answering questions about images and exhibiting general knowledge skills demonstrated by chatbots like ChatGPT. The model name is not specified but may stand for Multimodal 1.
The MM1 looks similar in design and sophistication to recent AI models from other tech giants, including Meta’s open-source Llama 2 and Google’s Gemini. Work by Apple’s competitors and academics suggests that these types of models can be used to power chatbots, or “agents,” that can write code and perform actions such as using computer interfaces or websites. Can solve tasks by doing. This suggests that the MM1 may yet find its way into Apple products.
“The fact that they’re doing it shows that they have the ability to understand how to train and build these models,” says Ruslan Salakhotdinov, a professor at Carnegie Mellon who has led several Years ago led AI research at Apple. “It takes a certain amount of skill.”
MM1 is a multimodal large language model, or MLLM, meaning it is trained on images as well as text. This allows the model to respond to text prompts and answer complex questions about specific images.
An example in Apple’s research paper shows what happened when MM1 was provided with an image of a restaurant table basking in the sun and a menu with two bottles of beer. When asked how much someone would expect to pay for “all the beer on the table,” the model correctly reads the correct price and inflates the price.
When ChatGPT launched in November 2022, it could only input and generate text, but recently its creator OpenAI and others have extended the underlying language model technology to work with other types of data. What has worked for When Google launched Gemini (the model that now powers ChatGPT’s response) last December, the company touted its multimodal nature as the start of an important new direction in AI. “Following the rise of LLMs, MLLMs are emerging as the next frontier in foundation models,” Apple’s paper says.
MM1 is a relatively small model as measured by the number of “parameters”, or internal variables, that are trained as a model. Boston University professor Kate Senko, who specializes in computer vision and machine learning, says this could make it easier for Apple’s engineers to experiment with different training methods and optimizations when they hit on something promising. are
The MM1 paper provides amazing detail on how the model was trained for corporate publishing, Senko says. For example, the engineers behind MM1 describe tricks to improve the model’s performance, including increasing the resolution of images and combining text and image data. Apple is known for its secrecy, but it has previously shown unusual openness about AI research as it tries to attract the talent it needs to compete in the cutting-edge technology.