‘After one month, most partners have each found hundreds of critical or severe vulnerabilities’: Anthropic claims Mythos has found over ten thousand major security vulnerabilities across ‘the most systemically important software in the world’

Anthropic reported that the Mythos Preview revealed over 10,000 high and critical vulnerabilities in less than two months, with Cloudflare alone finding 2,000 flaws
Independent validation confirmed that 90% of the assessed results were real, although critics argue that the breakthrough may stem from massive calculations and workflows rather than clear-cut reasoning
The bottleneck has shifted from discovery to verification, disclosure and patching, as AI now reveals vulnerabilities faster than organizations can remediate them

In less than two months, Anthropic and friends have apparently discovered more than ten thousand high-level critical security vulnerabilities using the famous Mythos Preview artificial intelligence tool.

In a brief update on the state of the project, published late last week, Anthropic said that since the release of the tool, about 50 organizations that used it have each found “hundreds” of vulnerabilities.

“Several have told us that their speed of troubleshooting has increased by more than a factor of ten,” the company said. “For example, Cloudflare has found 2,000 bugs (of which 400 are of high or critical severity) across their critical path systems, with a false positive rate that Cloudflare’s team considers better than human testers.”

Skepticism persists

Of those, 1,752 have been assessed by independent security researchers, and 90% were confirmed as valid positives, while 62.4% were confirmed as either high or critical severity.

But while the overall reaction to the Mythos Preview has been extremely positive, there are voices that say the hype may also be overblown. Techzine analysisfor example, argue that AI-assisted vulnerability discovery already existed through systems like Google’s Big Sleep, and that the real challenge remains human operational security.

A recent academic paper, “Benchmarking Mythos-Linked Bug Rediscovery,” found that under controlled conditions, public boundary models such as GPT-5.5 were able to rediscover some of the same vulnerabilities attributed to Mythos, and on Redditdifferent societies have been even more skeptical. The key takeaway seems to be that Mythos might simply be using massive amounts of computing and lengthy agentic workflows rather than possessing qualitatively different reasoning abilities.

In any case, Anthropic now says that progress on software vulnerabilities is no longer limited by the speed of discovery, but rather by the speed of verification, disclosure and patching.