Structure of Ghostbuster, our state-of-the-art method for AI-generated text detection.
Big language models like ChatGPT write impressively—so well, in fact, that they've become a problem. Students have started using these models to ghostwrite assignments, which is why some schools are moving in that direction. Ban ChatGPT. In addition, these models are prone to generating text with factual errors, so astute readers may want to know if generative AI tools are used to ghost news articles or other sources before trusting them. Done to write.
What can teachers and users do? Existing tools for AI-generated text detection sometimes perform poorly on data that differs from what they were trained on. In addition, if these models classify real human writing as AI-generated, they may endanger students whose real work is questioned.
Our recent paper Introducing Ghostbuster, an advanced AI-powered text detection method. Ghostbuster works by finding the probability of generating each token in a document under several weak language models, then combining functions based on those probabilities as input for the final classification. Ghostbuster does not need to know which model was used to create the document, nor the possibility of creating the document under that particular model. This property makes Ghostbuster particularly useful for detecting potentially generated text from an unknown model or black box model, such as the popular commercial models ChatGPT and Claude, for which probabilities are not available. We were particularly interested in making sure that Ghostbusters was well-publicized, so we explored a number of ways the text could be produced, including new collections of articles, news and stories from different domains. using data sets), language models, or notations.
Examples of human-authored and AI-generated text from our datasets.
Why this approach?
Many existing AI-generated text detection systems are prone to classifying different types of text (for example, different Stylesor different text generation Models or indicates). Simple models that use confusion Alone generally cannot capture more complex features and does particularly poorly on newly authored domains. In fact, we found that a confound-only baseline was worse than random on some domains, including non-native English speaking data. Meanwhile, classifiers based on large language models such as RoBERta easily capture complex features, but overfit the training data and generalize poorly: we found that the RoBERta baseline I had disastrous worst-case generalization performance, sometimes worse than just the base of the problem. Zero shot methods Those that classify text without training on labeled data, by calculating the probability that the text was generated by a given model, perform poorly even when the original text is used to generate the text. I used a different model.
How does Ghostbuster work?
Ghostbuster uses a three-step training process: computing probabilities, feature selection, and classifier training.
Computing Possibilities: We used weak language models (a unigram model, a trigram model, and two non-instruction-tuned GPT-3 models, ada and divinci).
Selection of features: We used a systematic search procedure to select features, by (1) defining a set of vector and scalar operations that combine probabilities, and (2) using forward feature selection. By finding useful combinations of these operations, iteratively adding the best to the remaining feature.
Classification training: We trained a linear classifier on the best likelihood based features and some additional manually selected features.
Results
When trained and tested on the same domain, Ghostbuster achieved 99.0 F1 across all three datasets, beating GPTZero by a margin of 5.9 F1 and DetectGPT by 41.6 F1. Outside the domain, Ghostbuster averaged 97.0 F1 across all conditions, beating DetectGPT's 39.6 F1 and GPTZero's 7.5 F1. Our RoBERTa baseline achieved an F1 of 98.1 when evaluated in these domains on all datasets, but its general performance was inconsistent. Ghostbuster outperformed the RoBERTa baseline on all domains except out-of-domain creative writing, and on average (13.8 F1 margin) had significantly better out-of-domain performance than RoBERTa.
Results on Ghostbuster's in-domain and out-of-domain performance.
To ensure that Ghostbuster is robust to the range of ways a user can specify a model, such as requesting different writing styles or reading levels, we evaluated Ghostbuster's robustness with several What in immediate variables? Ghostbuster outperformed all other tested methods on these prompt variants with an F1 of 99.5. To test the generalizability of the models, we evaluated performance on generated texts. Claudewhere Ghostbuster outperformed all other test modes with 92.2 F1.
AI-generated text detectors are fooled by lightly modifying the generated text. We tested Ghostbuster's robustness to edits, such as changing sentences or paragraphs, rearranging letters, or replacing words with synonyms. Most changes at the sentence or paragraph level did not significantly affect performance, although performance decreased easily if the text was edited through repeated paraphrasing, commercial detection AI such as Using undetectable AI, or multiple word or character level changes. Performance was excellent even on long documents.
Since the AI-generated text detector May misclassify With non-native English speakers' text generated as an AI, we evaluated Ghostbuster's performance on non-native English speakers' writing. All tested models had greater than 95% accuracy on two of the three tested datasets, but fared worse on the third set of smaller subjects. However, document length may be the key factor here, as Ghostbuster performs about the same on these documents (74.7 F1) as it does on other out-of-domain documents of similar length (75.6 to 93.1 F1).
Users who want to apply Ghostbuster to real-world use cases beyond the possible limits of text generation (eg, student essays written by ChatGPT) should note that for small text Errors are high, domains are far from the ones Ghostbuster is trained on (eg, different varieties of English), text from non-native speakers of English, human-edited model generations, or AI models with human input. Text generated by prompting to edit authored input. To avoid perpetuating algorithmic vulnerabilities, we strongly discourage automatically penalizing the alleged use of text generation without human oversight. Instead, we recommend careful, in-person use of Ghostbuster if classifying one's writing as AI-generated might harm them. Ghostbuster can also help with a variety of low-risk applications, including filtering AI-generated text from language model training data and checking whether online sources of information are AI-generated.
Result
Ghostbuster is a state-of-the-art AI-generated text detection model, with an F1 performance of 99.0 across tested domains, representing a substantial improvement over existing models. It generalizes well to different domains, prompts, and models, and is suitable for recognizing text from black-box or unknown models because it does not require access to probabilities from the specific model used to generate the document. .
Future directions for Ghostbuster include providing clarity for model decisions and improving robustness to attacks that specifically attempt to fool detectors. AI-generated text detection methods can also be used with alternatives e.g Watermarking. We also hope Ghostbuster can help with a variety of applications, such as filtering language model training data or flagging AI-generated content on the web.
Try Ghostbuster here: ghostbuster.app
Learn more about Ghostbusters here: [ paper ] [ code ]
Try to guess if the text itself is generated by AI here: ghostbuster.app/experiment