From a suspicious penchant for “delving in” to decidedly wishy-washy opinions, there are certain telltale signs that can give readers a sense that a piece of writing could be AI-generated—but declaring as much with consistent certainty has proven an especially tricky problem for developers attempting to create AI text detectors.
Many of these systems can be thwarted by the simple addition of an unusual symbol into the text. They might perform well on one type of content, say, student essays, but fall short on another, like news articles. And detectors that can reliably identify AI-generated writing can also mistake human-authored work for that of machines.
All of those findings come from a study by researchers at the University of Pennsylvania that casts doubt on some of the advertised claims of AI text detectors currently on the market. The authors propose a new method for quantifying how well these tools work in the form of a standardized benchmark consisting of a dataset of 10 million documents spanning news articles, blog posts, recipes, and more as well as a public leaderboard ranking of detectors.
“What we’re trying to contribute is a systematic way of benchmarking AI detectors, so that we know if someone believes that they’ve come up with an innovation, that we could validate that it is in fact doing much better, that it’s beating the state of the art,” Chris Callison-Burch, a UPenn computer and information science professor and study author, told Tech Brew.
Why it matters: Since the release of OpenAI’s GPT-2 in 2019—and especially after the viral phenomenon of ChatGPT almost two years ago—experts have become increasingly worried about the dangers of a newfound flood of text produced by LLMs. Many of those fears have since been borne out, from teachers throwing up their hands at AI-generated essays to academic research spam and scammers operating on a new mass scale.
Keep up with the innovative tech transforming business
Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.
Without accurate detection, though, it can be hard to even know the scope of the problem. OpenAI scrapped an early attempt at an AI text classifier last year after acknowledging it had “low accuracy.” And various attempts to watermark AI-generated text have also proven iffy.
Callison-Burch said he was struck by the contrast between that thorny problem and the number of startups claiming up to 99% accuracy in detection.
“There’s sort of two conflicting narratives about [AI text detection],” he told Tech Brew. “So on the one hand, there’s lots of startup companies or even academic papers that describe AI detectors as having, like, extremely high accuracy…But on the other hand, you have a lot of prevailing sentiment that it’s hard to detect.”
Easy to break: The research team cataloged a slew of ways they found to outsmart detectors, including replacing certain characters with similar-looking homoglyphs and certain words with British alternative spellings. They also found that detectors tended to work best on the type of writing or model output they were trained on–so a detector trained on ChatGPT might have trouble parsing text from Anthropic’s Claude or a tool trained on news articles might struggle with recipes.
Callison-Burch said at least one startup has since incorporated protection against these pitfalls into its detector, and he’s encouraged by how well some detectors are performing on the leaderboard.
“My feeling is it’s a little bit of a cat-and-mouse game or a little bit of an arms race,” he said. “As the LLMs get better and better, it’s harder and harder to detect them. But I think the need for doing so is more obvious than ever.”