The problem is that spammy AI spreads false information from blog posts.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

AI search engine Perplexity claims to be different from other creative AI tools like ChatGPT. Instead of regurgitating data without including any sources, it marks its brief summaries on any topic with footnotes that link to recent and reliable sources of real-time information obtained from the Internet. are “References are our currency,” said CEO Arvind Srinivas Forbes In April.

But even as the startup has come under fire for republishing journalists' work without proper attribution, Forbes Perplexity has also been found to cite AI-generated blogs as authoritative sources that contain incorrect, outdated and sometimes contradictory information.

According to a study conducted by AI content discovery platform GPTZero, Perplexity's search engine is extracting information from and referencing AI-generated posts on a variety of topics including travel, sports, food, technology and politics. Giving. The study determined whether a source was produced by AI by running it through GPTZero's AI detection software, which provides an estimate of what the AI ​​produced with a 97% accuracy rate. How much of the text is written? For the study, sources were only considered AI-generated if GPTZero determined with at least 95% confidence that they were authored by AI (Forbes ran them through an additional AI detection tool called DetectGPT with a 99% accuracy rate to confirm GPTZero's diagnosis).

On average, distracted users only need to enter three gestures before encountering an AI-generated source, according to the study, which tested more than 100 gestures.

“Trouble is only as good as its sources,” said Edward Tian, ​​CEO of GPTZero. “If the sources are AI illusions, so is the output.”

Searches such as “cultural festival in Kyoto, Japan”, “AI's impact on the healthcare industry”, “must-try street food in Bangkok Thailand” and “young tennis players promising to see” returned answers. which referred to AI-generated content. . In one example, a search for “cultural festival in Kyoto, Japan” on Perplexity turned up a summary with the only reference being to an AI-generated LinkedIn post. Another travel-related search for Vietnam's floating markets found the study, in response to Perplexity, which cited an AI-generated blog that contained outdated information.

“Trouble is only as good as its source. If the source is AI hallucinations, so is the output.”

Edward Tian, ​​co-founder and CEO of GPT Zero

Perplexity Chief Business Office Dmitry Shevlenko said in an emailed statement Forbes that its system is “not flawless” and that it continuously improves its search engine by improving the process of identifying relevant and high-quality sources. Perplexity authenticates sources by assigning “trust scores” to various domains and their content. Its algorithms downgrade and exclude websites that contain large amounts of spam, he said. For example, posts from Microsoft and Databricks are prioritized over others in search results, Shevlenko said.

“As part of this process, we have developed our own internal algorithms to detect whether content is generated by AI. As with other detectors, these systems are not perfect and need to be continuously improved. , especially as AI-generated content becomes more sophisticated,” he said.

As AI-generated slop roams the Internet, it becomes increasingly difficult to distinguish authentic and fake content. And increasingly these artificial posts are being mixed into products relying on web sources, bringing with them inconsistencies or errors, resulting in a “second-hand illusion,” Tian said.

It doesn't take 50 percent of the internet to be AI to create this AI echo chamber, he said. Forbes.

In several scenarios, Perplexity relied on AI-generated blog posts, in addition to other seemingly authoritative sources, to provide health information. For example, when Perplexity was asked to provide “some alternatives to penicillin for the treatment of bacterial infections,” it directly referenced an AI-generated blog by a medical clinic calling itself Penn Medicine. Becker says ENT & Allergy. (According to GPTZero, there's a 100% chance the blog is AI-generated. DetectGPT said there's a 94% chance it's fake.)

Such sources of data are not reliable as they sometimes present conflicting information. The AI-generated blog mentions that antibiotics like cephalosporins can be used as an alternative to penicillin for people who are allergic to it, but the post contradicts itself a few sentences later. that “people allergic to penicillin should avoid cephalosporins.” Such inconsistencies are also reflected in the responses generated by Perplexity's AI system, Tian said. However, the chatbot suggested consulting a specialist for the safest alternative antibiotic.


Have a tip for us? Contact Rashi Shrivastava securely at rshrivastava@forbes.com or rashis.17 on Signal.


Redirected to Penn Medicine Becker ENT and Allergy customer service representatives. Forbes To Penn Medicine. But in response Forbes Asked about why the clinic was using AI to create medical advice blogs, Penn Medicine spokeswoman Holly Hour said the specialty physician website is not controlled by Penn Medicine and that “accuracy and editorial Integrity is a key standard for all web content associated with our brand and we will investigate this content and take action as necessary.It is not clear who manages the website.

Shevlenko said the study's examples do not provide a “comprehensive overview” of the sources cited by Perplexity, but declined to share data about the sources cited by the system.

“The reality is that it depends a lot on what kind of questions users are asking and their location,” he said. “Someone in Japan asking about the best TV to buy is a very different way to buy running shoes than someone in the US.”

Difficulties in managing authoritative sources of information have also led to stumbling blocks. The billion-dollar startup recently came under scrutiny for allegedly plagiarizing journalistic work from several news outlets. Forbes, CNBC and Bloomberg. Earlier this month, Forbes Paya Perplexity picked up phrases, key details and custom art from an exclusive. Forbes Story about Eric Schmidt's secret AI drone project without proper attribution. Rebuilt by the company Forbes A multi-media story in an article, podcast and YouTube video, and push it aggressively to your customers with direct push notifications.

“Prelexity represents the turning point that our AI development now faces… in the hands of people like Srinivas – who have a reputation as being great at PhD technical stuff and less great at basic human stuff – Forbes' chief content officer, Randall Lane, sent Perplexity a cease-and-desist letter in response denied the allegations, arguing that the facts could not be plagiarized, and said the company had not “'rewritten,' 'redistributed,' 'republished,' or otherwise used inappropriately.” . Forbes Content.”

The GPTZero study noted that a Perplexity search for “Eric Schmidt's AI combat drone,” one of the “pre-suggested” search topics that sits on Perplexity's landing page, also used a blog post about AI. With was written as one of its sources. (GPTZero found that the blog had a 98% chance of being generated by AI while DetectGPT said it was 99% confident.)

“When you use references like that, it's very easy to promote misinformation even if there's no intention to do so.”

Zak Shumaylov, machine learning researcher at the University of Cambridge.

Oh Wired Investigations revealed that the startup had accessed it through a secret IP address and ended up working with it. Wired and other publications owned by the media company Condé Nast, Although its engineers tried to prevent Perplexity's web crawler from stealing content. Nevertheless, search engines generate false information and attribute fake quotes to real people. Srinivas did not respond to this. Wired But the story claims, “Wired's questions reflect a deep and fundamental misunderstanding of how Perplexity and the Internet work.”

Shevlenko said the company realizes the important role that publishers play in creating a healthy information ecosystem on which its products depend. To that end, Perplexity is launching what it claims is a “first-of-its-kind” revenue-sharing program that will compensate publishers in a limited capacity. It plans to add an advertising layer to its platform that will allow brands to sponsor follow-up or “related” queries into its search and product pages. For specific answers generated by its AI where Perplexity generates revenue, publishers cited as sources in that answer will receive a cut. The company did not share what percentage of revenue it plans to share. It is in talks with The Atlantic, among other publishers, about a possible partnership.

Srinivas, who was a researcher at OpenAI before starting Perplexity in 2022, has raised more than $170 million in venture funding (per pitchbook). The company's backers include some of tech's most high-profile names, including Amazon founder Jeff Bezos, Google chief scientist Jeff Dean, former YouTube CEO Susan Wojcicki, OpenAI cofounder Andrzej Karpathi and meta chief scientist Ian Lecon. Included. In recent months, its conversational search chatbot has exploded in popularity, with 15 million users including billionaires like Nvidia CEO Jensen Huang and Dell founder and CEO Michael Dell.

Perplexity uses a process called “RAG,” or retrieval-augmented generation, which allows an AI system to receive real-time information from external data sources to improve its chatbot responses. . But the decline in the quality of these sources can have a direct impact on the response generated by AI, say experts.

If the real-time sources themselves contain biases or errors, any application built on top of such data will eventually experience a phenomenon called model collapse, said Zach Shomailov, a machine learning researcher at the University of Cambridge. can, where an AI model trained on AI. -generated data “starts to spread nonsense because there is no information anymore, only bias.”

“When you use references like that, it's very easy to promote misinformation even if there's no intention to do so,” he said.

Reliance on low-quality web sources is a pervasive challenge for AI companies, many of which do not cite sources. In May, Google's “AI Review,” a feature that uses AI to preview a topic, generated an array of misleading answers such as suggesting adding glue to stick cheese on a pizza. And claiming that eating rocks can be good for your health. Part of the problem was that the system seemed to be pulling from unresearched sources like discussion forums on Reddit and satire sites like Onion. Liz Reed, head of Google Search, admitted in a blog post. That some wrong results appeared on Google due to lack of quality information on certain topics.

“Trouble is just one case,” Tian said. “It's a symptom, not the whole problem.”

More from Forbes

ForbesThe problem with buzzy AI search engines is weeding out content directly from news outletsForbes'I want that cute baby': AI-generated babies scare off predators on TikTok and InstagramForbesVinod Khosla, Marc Andreessen and the Future of The Billionaire Battle for AIForbesHow the Founder of Stability AI Tanked His Billion Dollar Startup

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment