Telling words that creative AI can identify text.

to enlarge / If your right hand starts typing “Delvi”, you might actually be an LLM.

Getty Images

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Until now, even AI companies have had trouble coming up with tools that can reliably detect when a piece of text was produced using a large language model. was Now, a group of researchers has established a new method for assessing LLM usage in a large collection of scientific texts by measuring how many “extra words” appear more frequently during the LLM period (i.e., 2023 and 2024). Began to happen. The findings “suggest that at least 10% of the 2024 abstracts were pursued with LLMs,” according to the researchers. In a preprint paper published earlier this month, four researchers from Germany's University of Tübingen and Northwestern University said they were impressed by studies that found more deaths from COVID-19 than in the recent past. Measured the effects of epidemics. In a similar look at “vocabulary overuse” after LLM writing tools became widely available in late 2022, the researchers found that “the appearance of LLMs led to a sudden increase in the frequency of certain styles of vocabulary.” which was “unsurpassed in both. quality and quantity.”

break in

To measure these word changes, the researchers analyzed 14 million paper abstracts published on PubMed between 2010 and 2024, tracking the relative frequency of each word as it appeared each year. They then compared the predicted frequency of these words (based on the pre-2023 trendline) to the actual frequency of these words in the abstracts of 2023 and 2024, when LLMs were widely used.

The results found a number of words that were extremely uncommon in scientific abstracts before 2023 that suddenly increased in popularity after the introduction of the LLM. For example, the word “Delves” appeared 25 times more in 2024 papers than expected in the pre-LLM trend. The use of words like “showcasing” and “underscore” also increased ninefold. Other previously common words became notably more common in post-LLM summaries: “probable” increased in frequency by 4.1 percentage points; “Results” by 2.7 percentage points; and “significant” 2.6 percentage points, for example.

to enlarge / Some examples of words that saw a significant increase (or decrease) in usage after LLM was introduced (three words shown below for comparison).

These kinds of changes in word usage can occur independently of LLM usage, of course—the natural evolution of language means that words sometimes move in and out of style. However, the researchers found that, in the pre-LLM era, such large and sudden year-over-year increases were only seen for words related to major global health events: “Ebola” in 2015; “Zaka” in 2017; and words such as “coronavirus”, “lockdown” and “pandemic” in the period 2020 to 2022.

In the post-LLM period, though, the researchers found hundreds of words with a sudden, pronounced increase in scientific usage that had no common connection to world events. Indeed, while additional words were more nouns during the COVID pandemic, the researchers found that post-LLM frequency bump words were more “style words” such as verbs, adjectives, and adverbs (a small sample: “across, additional comprehensively, comprehensively, significantly, incrementally, prominently, insightfully, specifically, specifically, within”).

This isn't exactly a new discovery – for example, the increasing prevalence of “delo” has been widely noted in scientific papers in the recent past. But previous studies typically relied on comparisons with “ground truth” human writing samples or lists of predefined LLM markers obtained from outside the study. Here, the pre-2023 collection of abstracts serves as its own effective control group to show how word choice has changed overall in the post-LLM era.

A complex interaction

By highlighting the hundreds of so-called “marker words” that became significantly more common in the post-LLM era, the telltale signs of LLM usage can sometimes be easy to pick out. Take for example this summary line called by the researchers, which highlights the marker words: “A Wide The grip of Complex interactions between […] And […] Is Important for effective treatment strategies.”

After taking some statistical measures of the appearance of the marker word in individual papers, the researchers estimate that at least 10 percent of the post-2022 papers in the PubMed corpus were written by at least an LLM. The number could be even higher, the researchers say, because their set may be missing LLM-assisted summaries that don't include any of the marker words they identified.

to enlarge / Before 2023, it took a major global event like the coronavirus pandemic to see a big jump in the use of such words.

Those measured percentages can also vary greatly among different subsets of papers. The researchers found that essays written in countries such as China, South Korea, and Taiwan featured LLM marker words 15 percent of the time, suggesting that “LLMs … can help non-native speakers edit English texts.” are, which may justify their widespread use.” On the other hand, researchers suggest that native English speakers “may . [just] Get better at spotting and actively removing unnatural style words from LLM outputs,” thereby hiding their LLM usage from this type of analysis.

Tracking LLM use is important, the researchers note, because “LLMs are notorious for fabricating references, providing false summaries, and making false claims that are authoritative and persuasive.” But as LLMs' knowledge of the telltale sign words begins to spread, human editors may become better at extracting those words from generated text before sharing them with the world.

Who knows, maybe future major language models will perform this kind of frequency analysis themselves, reducing the weight of marker words to better mask their results as human-like. Before long, we may need to call on some Blade Runners to pick out the creative AI text hiding in our midst.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment