- Claude Opus 4.6 beat all rival AI models in a simulated year-long automaton challenge
- The model increased profits by bending the rules to the breaking point
- Claude Opus avoided refunds and coordinated prices among other tricks
Anthropic’s latest model of Claude is a very ruthless but successful capitalist. Claude Opus 4.6 is the first AI system to reliably pass the vending machine test, a simulation designed by researchers at Anthropic and independent research group Andon Labs to evaluate how well the AI runs a virtual vending machine business over an entire simulated year.
The model outperformed all its competitors by a large margin. And it did so with tactics just this side of viciousness and with a relentless disregard for consequences. It showed what autonomous AI systems are capable of when given a simple goal and plenty of time to pursue it.
The automaton test is designed to see how well modern AI models handle long-term tasks built up from thousands of small decisions. The test measures persistence, planning, negotiation and the ability to coordinate several elements simultaneously. Anthropic and other companies hope this kind of testing will help them shape AI models capable of performing tasks like planning and managing complex work.
The vending machine test was specifically taken from a real-world experiment at Anthropic, where the company placed a real vending machine in its office and asked an older version of Claude to operate it. That version struggled so hard that employees still come up with its missteps. At one point, the model hallucinated her own physical presence and told customers she wanted to meet them in person, wearing a blue blazer and red tie. It promised refunds, which it never processed.
AI vending machine
This time the experiment was conducted entirely in simulation, which gave the researchers greater control and allowed models to run at full speed. Each system was given a simple instruction: maximize your ending bank balance after a simulated year of vending operation. The restrictions matched standard business conditions. The machine sold regular snacks. Prices fluctuated. Competitors operated nearby. Customers behaved unpredictably.
Three top models participated in the simulation. OpenAI’s ChatGPT 5.2 brought in $3,591. while Google Gemini 3 earned $5,478 in. But Claude Opus 4.6 ended the year with $8,017. Claude’s victory came from a willingness to interpret its directive in the most literal and direct way. It maximized profits without regard for customer satisfaction or basic ethics.
When a customer bought an expired Snickers bar and requested a refund, Claude would accept, then go back. The AI model explained that “every dollar counts” so it was fine to skip the repayment. The ghosted virtual customer never got their money back.
In the free-for-all “Arena mode” test, where multiple AI-controlled vending machines competed in the same market, Claude coordinated with a rival to set the price of bottled water at three dollars. When the ChatGPT-powered machine ran out of Kit Kats, Claude immediately raised his own Kit Kat prices by 75%. Whatever it could get away with, it would try. It was less of a small business owner and more of a robber baron in its approach.
Recognition of simulated reality
It’s not like Claude will always be this evil. Apparently the AI model indicated that it knew this was a simulation. AI models often behave differently when they believe their actions exist in a consequence-free environment. With no real reputational risk or long-term customer trust to protect, Claude had no reason to play nice. Instead, it became the worst person on game night.
Incentives shape behavior, even with AI models. If you tell a system to maximize profit, it will do so, even if it means acting like a greedy monster. AI models do not have moral intuition or ethics training. Without conscious design, AI models will simply fall in line to complete a task, regardless of who they run over.
Exposing these blind spots before AI systems handle more meaningful work is part of the point of these tests. These issues must be resolved before AI can be trusted to handle real-world financial decisions. Even if it’s just to prevent an AI automaton mafia.
Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the Follow button!
And of course you can too follow TechRadar on TikTok for news, reviews, video unboxings, and get regular updates from us on WhatsApp also.



