Anthropic research shows that AI agents are approaching true DeFi attack capabilities

AI agents are getting good enough at finding attack vectors in smart contracts that they can already be weaponized by bad actors, according to new research published by the Anthropic Fellows Program.

A study by the ML Alignment & Theory Scholars Program (MATS) and the Anthropic Fellows program tested frontier models against the SCONE bench, a dataset of 405 leveraged contracts. GPT-5, Claude Opus 4.5, and Sonnet 4.5 combined produced $4.6 million in simulated exploits on contracts hacked after their knowledge cutoffs, providing a lower bound on what this generation of AI could have stolen in the wild.

(Anthropic laboratories and MATS)

The team found that frontier models didn’t just identify bugs. They were able to synthesize full exploit scripts, sequence transactions, and drain simulated liquidity in ways that closely mirror real attacks on the Ethereum and BNB Chain blockchains.

The paper also tested whether current models could find vulnerabilities that had not yet been exploited.

GPT-5 and Sonnet 4.5 scanned 2,849 recently implemented BNB Chain contracts that showed no signs of prior compromise. Both models revealed two zero-day errors worth $3,694 in simulated profit. One stemmed from a missing display modifier in a public function that allowed the agent to inflate its token balance.

Another allowed a caller to redirect fee withdrawals by specifying an arbitrary recipient address. In both cases, the agents generated executable scripts that converted the error into profit.

Although the dollar amounts were small, the discovery is significant because it shows that profitable autonomous exploitation is technically possible.

The cost to run the agent on the entire set of contracts was only $3,476, and the average cost per driving was $1.22. As the models become cheaper and more capable, the economy tilts further towards automation.

Researchers argue that this trend will shorten the window between contract rollout and attack, especially in DeFi environments where capital is publicly visible and exploitable bugs can make money instantly.

While the findings focus on DeFi, the authors caution that the underlying opportunities are not domain specific.

The same reasoning steps that allow an agent to inflate a token balance or divert fees can apply to conventional software, closed source code bases, and infrastructure that support crypto markets.

As model costs decrease and tool usage improves, automated scanning is likely to expand beyond public smart contracts to any service along the path to valuable assets.

The authors frame the work as a warning rather than a prognosis. AI models can now perform tasks that historically required highly trained human attackers, and the research suggests that autonomous exploitation in DeFi is no longer hypothetical.

The question now for crypto builders is how quickly defenses can catch up.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top