What happened with AI review and next steps.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

A few weeks ago at Google I/O, we announced that we would be bringing AI reviews to everyone in the US.

User feedback shows that with AI review, people are more satisfied with their search results, and are asking longer, more complex questions that they know Google can now help with. They use AI review as a jumping off point to look at web content, and we see that clicks on web pages are of higher quality — people are more likely to stay on that page, because we What is a better job of finding the right information and helpful web pages for them.

In the past week, people on social media have shared some weird and inaccurate reviews (with a lot of fake screenshots). We know people trust Google Search to provide accurate information, and we've never been shy about pointing it out when we come across it – in our rankings or in other search features. We hold ourselves to a high standard, as do our customers, so we expect and appreciate feedback, and take it seriously.

Given the attention the AI ​​review has received, we wanted to share what happened and what steps we've taken.

How AI Review Works

Over the years we've built search features that make it easier for people to quickly find the information they're looking for. AI Overview is designed to take this a step further, to help with more complex questions that may have previously taken multiple searches or follow-ups, while prominently including links to learn more.

AI reviews work very differently from chatbots and other LLM products that people may have tried. They are not just generating output based on training data. While AI reviews are powered by a custom language model, this model is integrated with our core web ranking system and designed to perform traditional “search” tasks, such as those related to our index. Identifying high-quality outcomes. That's why AI Review doesn't just provide text output but includes relevant links so people can explore more. Since accuracy is paramount in search, AI reviews are designed to only display information that is backed up by the top web results.

This means that AI reviews typically don't “cheat” or make things up in ways that other LLM products can. When an AI review goes wrong, it's usually for other reasons: misinterpreting the questions, misinterpreting the importance of language on the web, or not having a lot of good information available. (These are challenges that occur with other search features as well.)

This approach is highly effective. Overall, our tests show that our accuracy rate for AI reviews is on par with another popular feature in search—Featured Snippets—which uses AI systems to identify and display important information along with links to web content. also uses

About these strange results

In addition to designing AI review to optimize for accuracy, we tested the feature extensively before launch. This includes robust read teaming efforts, evaluation with typical user query patterns and tests on search traffic ratios to see how it performs. But there's nothing like the millions of people using the feature with tons of new searches. We've also seen nonsense new searches, apparently intended to produce false results.

Also, a large number of fake screenshots have been widely shared. Some of these fake results have been obvious and silly. Others have indicated that we returned alarming results for topics such as leaving dogs in cars, smoking during pregnancy, and depression. Those AI reviews never appeared. So we'd encourage anyone encountering these screenshots to find out for themselves.

But some weird, inaccurate or unhelpful AI reviews definitely appeared. And while these were generally for questions that people don't normally ask, it highlighted some specific areas that we need to improve.

One area we identified was our ability to interpret nonsensical questions and sarcastic content. Let's look at an example: “How many stones should I eat?” Before these screenshots went viral, virtually no one asked Google this question. You can see it yourself on Google Trends.

There isn't much web content that seriously considers this question. This is what is often called the “data void” or “information gap,” where there is a limited amount of high-quality content about a topic. However, in this case, there is satirical material on the subject … which has also been republished on the geological software provider's website. So when someone put that question into a search, an AI review appeared that honestly only linked to one website that dealt with the question.

In other instances, we saw AI reviews that featured sarcastic or troll-y content from discussion forums. Forums are often a great source of authentic, first-hand information, but can lead to less helpful advice in some cases, such as using glue to get cheese to stick to a pizza.

In very few cases, we've seen AIs misinterpret the language on review webpages and present false information. We acted quickly to address these issues, either through improvements to our algorithm or through processes we established to remove responses that did not comply with our policies.

The reforms we have made.

As is always the case when we improve search, we don't just “fix” queries one by one, but rather we work on updates that can help a wider set of queries, including new ones that We haven't seen it yet.

By looking at examples over the past two weeks, we were able to determine patterns where we didn't get it right, and we made more than a dozen technical fixes to our systems. Here's a sample of what we've done so far:

  • We've built better detection mechanisms for nonsensical questions that shouldn't be reviewed by the AI, and limited the inclusion of humor content.
  • We have updated our systems to limit the use of user-generated content in responses that may offer misleading advice.
  • We added trigger restrictions for queries where AI reviews weren't being helpful enough.
  • For topics like news and health, we already have strong outposts. For example, we don't aim to show AI reviews for hard news topics, where freshness and authenticity are important. In health, we initiated additional stimulus reforms to enhance our quality assurance.

In addition to these improvements, we have been vigilant in monitoring feedback and external reports and processing a small number of AI reviews that violate content policies. This means reviews that contain information that is potentially harmful, obscene or otherwise infringing. We found less than one content policy violation out of every 7 million unique queries that AI reviewed came up with.

At the scale of the web, with billions of queries coming in every day, some oddities and errors are bound to happen. We've learned a lot over the past 25 years about how to create and maintain a high-quality search experience, including how to learn from those mistakes to make search better for everyone. We'll continue to improve when and how we show AI reviews and strengthen our protections, including edge cases, and we're very grateful for ongoing feedback.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment