AI-generated code contains more bugs and errors than human output

The average AI-generated pull request has 10.83 problems compared to 6.45 for human code, the report claims
The quality could be better in terms of typos, leaving room for human reviewers
Microsoft code fixes are up, but that’s probably the overall output

AI-generated code is actually exposed to more vulnerabilities than human-generated code, raising questions about the reliability of some tools, new data from CodeRabbit has claimed.

Pull requests made with AI tools had an average of 10.83 issues compared to 6.45 issues in human-made pull requests, ultimately leading to longer reviews and the potential for more bugs to make it through to the finished product.

In addition to having 1.7x more issues overall, AI-generated pull requests also had 1.4x more critical issues and 1.7x more major issues, so they’re not just small numbers.

AI-generated code is not as secure as you might think

Logic and correctness errors (1.75x), code quality and maintainability (1.64x), security (1.57), and performance (1.42x) all saw higher-than-average code errors, and the report criticized AI for introducing more serious errors that human reviewers could fix.

Some of the problems AI would most likely introduce include incorrect password handling, insecure object references, XSS vulnerabilities, and insecure deserialization.

“AI coding tools dramatically increase output, but they also introduce predictable, measurable weaknesses that organizations must actively mitigate,” commented CodeRabbit AI director David Loker.

However, this is not necessarily a bad thing, as AI improves efficiency across the initial stages of code generation. The technique also introduced 1.76 times fewer typos and 1.32 times fewer testability issues.

So while the study highlights some of AI’s flaws, it also serves the important purpose of demonstrating how humans and AI agents could interact with each other in the future. Instead of displacing human workers, we’re seeing human work shift to AI control and review—computers just handle some of the tedious tasks that slow humans down in the first place.

While Microsoft claims to have patched 1,139 CVEs in 2025, making it the second highest year ever, that doesn’t necessarily equate to a bad thing. With AI, developers are creating more code to begin with, so the overall percentage of risky code may not be as bad as these numbers initially suggest.

Then there’s the fact that AI models, like OpenAI’s GPT family, are constantly being improved to produce more accurate, less error-prone results.

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, video unboxings, and get regular updates from us on WhatsApp also.

Must Read

Leave a Comment Cancel Reply