“ChatGPT keeps getting flagged over and over again” – Gemini is the best AI at imitating human writing and avoiding detection

Gemini produces the most human-like writing among major AI tools, according to researchers.
AI-written content has become increasingly difficult for many detectors to flag.
AI detection tools vary widely in accuracy, leading to inconsistent results for the same piece of content.

Google Gemini outperforms its peers among AI chatbots when it comes to convincing people that content generated by the model comes from a human, researchers have found.

Articles and stories composed using Gemini slip past registration tools more often than those produced by rivals like ChatGPT or Grok, a dubious honor as the Internet fills with poorly generated AI slop.

The findings come from an analysis by Open Resource Applications, which tested a dozen widely used AI systems by giving each the same task. Each model was asked to produce a long, human-sounding article. These pieces were then run through three detection platforms, Grammarly, QuillBot and GPTZero, to see how easily they could be identified as machine-generated. Gemini came out ahead, with the lowest overall detection rate among the group.

The article continues below

That result is less about one model winning and more about what happens next. For readers, writers, and anyone who spends time online, the distinction between human and AI writing is becoming less reliable, even when tools are designed specifically to make that distinction clear.

AI mimics humans

The survey’s figures tell a straightforward story. Gemini’s output was flagged far less often by Grammarly and not at all by QuillBot, while GPTZero still identified the most AI text across the board. Still, the gap between these tools is significant. This means that the same piece of writing is perceived as completely human in or clearly artificial based solely on an app that the author has no way of convincing.

A student submitting courses may pass one detector and fail another. A legal writer can have their work questioned by what software their boss chooses to use. For the average person, the result is growing uncertainty about how writing is judged and understood.

Gemini proved to be the most convincing at mimicking human writing, with its output rarely flagged by Grammarly and not at all by QuillBot, while Grammarly showed the weakest detection ability overall, identifying only 43.5% of AI-generated content, and GPTZero stood out as the most effective tool, correctly recognizing 98% of the text.

Part of Gemini’s advantage seems to come from how it differs from its rivals in putting sentences together. Detection tools often rely on patterns and look for predictable structures or familiar phrasing. Models that vary their structure and develop ideas in less consistent ways are harder to capture because they do not follow the same recognizable rhythms.

“Tools like GPTZero flag predictability and overall structure too, so a model that actually reasons through ideas rather than recycling familiar phrases will be much harder to capture,” said an ORA spokesperson.

“The gap between models is already wide enough that the same prompt produces completely different results depending on which tool you use. Most people choose an AI typing tool by grabbing the one that’s most popular, and that’s exactly why ChatGPT keeps getting flagged over and over again.”

ChatGPT cannot fool AI detectors

That would help explain why ChatGPT, despite its enormous reach, performed relatively poorly in the same test. With hundreds of millions of users, it has become the most familiar voice in AI writing. That familiarity has made it easier to recognize.

“ChatGPT ranks so low because it was the first big AI on the market and everyone knows what it sounds like,” explains a spokesperson from Open Resource Applications. “Many models that came after first sounded like Chat before they became more unique. That’s why AI detectors flag it so easily.”

In a way, ChatGPT’s influence has counteracted that. By shaping early expectations of what AI writing sounds like, it gave recording tools a template to follow. Newer models like the Gemini have moved beyond that template, introducing more variety and less predictability.

AI slop increases

These kinds of tests matter a lot as millions of people keep trying AI tools and producing AI slop for release. Some studies suggest that around half of online content is now generated by AI in some form.

Platforms have begun to respond by filtering out content that seems overly artificial, but that approach depends on detection tools that are far from consistent. The problem is not false alarms, but missed records, especially as models improve.

The larger pattern is hard to ignore. AI typing isn’t just improving; it is diversified. Different models now produce different styles, making it harder to define a single ‘AI voice’. This diversity complicates detection while making the technology more useful.

Gemini’s performance in this study might suggest that it’s better at typing, but what it really succeeds at is avoiding the patterns that give AI away. That may be a temporary advantage as detection tools adapt and other models follow suit, but it highlights how quickly the landscape is changing.

For readers, the takeaway is less about choosing sides and more about adjusting expectations. The Internet is no longer a space where human and machine writing can be easily separated. It’s a blend, and that blend is becoming more seamless.

In that environment, the question is no longer whether something sounds human—increasingly, everything does.

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, video unboxings, and get regular updates from us on WhatsApp also.

The best business laptops for all budgets

Must Read

Leave a Comment Cancel Reply