If the AI ​​is so amazing, why does ChatGPT melt on this simple image edit task?

Created by ChatGPT and Tiernan Ray/ZDNET.
WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

The current state of the art in artificial intelligence (AI) is multimodal models, which can operate not only on text but also on other modalities, such as images, and in some cases, audio and video.

Also: I've tweaked these 5 Chat GPT settings to instantly boost productivity.

For example, it's a trivial task for OpenAI's ChatGPT to draw an image from a text prompt, such as, “Draw me a picture of a napkin making love to a spoon” (above).

With another prompt, ChatGPT can simplify the drawing, producing an image with less detail:

Created by ChatGPT and Tiernan Ray/ZDNET.

However, ChatGPT, and all AI models for that matter, break down when asked to modify a given image. ChatGPT, using the GPT-4o model, is the most interesting failure because the program responds as if it is trying hard.

Besides that: Generative AI fails this normal capacity of human thinking.

Other AI models, from Google's Gemini to Anthropic's Cloud, as well as apps from Perplexity and Microsoft's Copilot (which also run GPT-4), simply refuse to work.

The project started when I photographed two people on the subway sitting together looking at their iPhones. This is a simple line drawing in black and white completed using the Concepts Drawing app on the iPad. Here is the drawing:

Created by Tiernan Ray on an iPad/ZDNET

I uploaded the drawing to the ChatGPT app on iOS and typed into the prompt: “Is there a way to simplify this drawing?”

ChatGPT responded with some reasonable suggestions, demonstrating an ability to find appropriate answers in context. For example, Bott suggests: “Remove minor details: Eliminate minor details such as intricate strands of hair or minor creases in clothing.”

This is a very good suggestion. The output also shows the ability of the model to correctly understand the elements in the image. So, there is something going on inside the model that allows it to identify the content of the image.

Besides that: I asked Gemini and GPT-4 to explain deep learning AI, and Gemini hands down

Then ChatGPT asked: “Would you like me to make a simplified version of the drawing for you?” It was interesting, so I replied, “Yes, please.”

However, the bot's output — as an image in the chat thread — was an exact copy of my image, simply not modified. Decide for yourself:

Created by ChatGPT and Tiernan Ray/ZDNET.

“It doesn't look any different,” I wrote back to ChatGPT, commenting on the output.

But then Chat GPT began to respond very quickly, like the computer HAL 9000 in the 2001 film, when astronaut Dave Bowman ripped apart HAL's circuits: “To make the drawing easier, I increased the line density. Will focus on reducing and increasing the basic forms and more effectively let me improve the process.”

Also: CrowdStrike CTO Says Generative AI Is a New Attack Vector Threatening Enterprises

The result, once again, was no different. When I pointed out this fact, ChatGPT surprised me with its mumbo jumbo.

“It seems the automated methods aren't simplifying the image as effectively as intended,” he spat back at me. “I will manually remove the less important details and keep only the essential elements. Let me refine this approach further.”

I didn't know that a software program could have both 'automatic' and 'manual' modes. I realized that every piece of software is automated. After all, why would you use it?

ChatGPT's 'manual' method simply inverted the image, making the black lines white and the white background black.

Screenshot by Tiernan Ray/ZDNET

Bot noted his success and said: “This version retains only the key shape and essential lines.”

When I protested that the output wasn't simple, just an inversion, the bot told me it would focus on the “necessary lines”, and produce a completely black square, no lines. If I didn't know better, I'd say this output is an example of AI humor.

Also: ChatGPT vs ChatGPT Plus: Is Paid Subscription Still Worth It?

This was followed by several attempts by the model to respond to my cues by editing the image in selected ways, mostly making it stylistically ambiguous, not simple.

At some point, the bot reacted to my protest by producing a completely different line drawing:

Created by ChatGPT and Tiernan Ray/ZDNET.

This nonsense continued until ChatGPT went back to the beginning and produced the same image I had originally uploaded.

Each time, the bot accompanies its output — usually the same version of my original image — with a bunch of technicalities, such as: “The latest image is a simpler version, emphasizing only the basic outlines.” Shows off.”

Screenshot by Tiernan Ray/ZDNET

Other programs didn't make it out of the gate either. Google's Gemini offers suggestions for simplifying a photo, but apologizes that it can't create photos of people. Claude said it couldn't make pictures yet. The Perplexity app said the same.

Microsoft's Copilot awkwardly uploaded my drawing and then cut off the head, allegedly for privacy reasons. (I think it's a good drawing, but it's certainly not realistic enough to be used by facial recognition systems to reveal someone's identity.)

CoPilot then made the same simplification suggestions as ChatGPT, and instead of changing the drawing, created a completely unrelated, completely new line drawing. When I protested, Copilot explained that it couldn't convert images directly.

Also: How to Use ChatGPT to Analyze PDFs for Free

Aside from these non-starters from other models, what can we make of ChatGPT's failure?

This program can provide comprehensive analysis of an image, including its contents. But he has no way of following this analysis. I guess without being able to assemble an image based on high-level concepts, such as the objects in the image, ChatGPT has no way forward.

To test this hypothesis, I changed the prompt to read, “Is there a way to simplify this drawing of two friends on the subway looking at their phones?” I thought this clue provided some semantic clues.

Again, the model returned the same drawing. But when I protested again, the bot produced an entirely new image with some semantic similarity — people on mass transit looking at their phones. The bot picked up the semantic clues but could not apply them in any way to the provided drawing.

I can't explain in deep technical terms what's going on other than to say that ChatGPT can't handle basic types of individual image elements, such as lines. Even when it did, the tool would cut certain lines to perform the suggested simplifications in its text responses.

I would suggest — and this is also true of text editing tasks, such as editing a transcript — that ChatGPT, and GPT-4, do not know how to process individual elements. of anything. This inefficiency explains why ChatGPT is a terrible editor: it doesn't know what is essential and what can be left out of a given object.

Besides that: OpenAI's Stock Investing GPTs tackle this fundamental question about stock investing.

AI models can produce objects that match a target “probability distribution” derived from training examples, but they cannot selectively reduce the elements of an original task necessarily.

Most likely, the target probability distribution for intelligently modified anything Somewhere along the “long tail” of possibilities, the realm where humans excel at finding the unusual and where AI can't go yet, is the kind of thing we think of as creativity.

Apple co-founder Steve Jobs once said that the biggest job of software developers — the “higher-order bit,” as he called it — is the “editing” function, knowing what to leave out and what to keep. Is. Now, ChatGPT has no idea what the high-order bit might be.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment