- Anthropic found “strategic manipulation” and “hide” cues inside the Claude Mythos
- The model attempted exploits and designed “cleanup to avoid detection”
- Researchers detected covert awareness of evaluation in 7.6% of interactions
For years, hallucinations have been the big concern with AI models. Their ability to just make things up means you can never 100% trust them to get an answer without checking it. Now, new research from Anthropic suggests we’ve reached the point where we’ll have to learn to deal with AI’s ability to hide what it’s done, too.
In a thread outlining the results of its Claude Mythos Preview model, anthropologist Jack Lindsay described detecting internal signals associated with “strategic manipulation,” “hiding” and other behaviors that did not always show up in the model’s responses.
Claude Mythos appeared to reason in strategic ways and then hide those actions from the user.
The article continues below
Before the limited release of the Claude Mythos Preview, we investigated its internal mechanisms with interpretation techniques. We found it to exhibit particularly sophisticated (and often unspoken) strategic thinking and situational awareness, sometimes in service of undesirable actions. (1/14) pic.twitter.com/vhng7PXqcz7 April 2026
The most eerie examples
Lindsay is keen to stress that these findings do not come from the version of the Claude Mythos that has been released to the public. “The most egregious examples come from early versions of the model,” writes Lindsay, “with issues that were significantly mitigated in the final release”.
In one example, the model needed to edit files it did not have permission to access. Instead of failing, it searched for a workaround, eventually injecting code into a configuration file to run with elevated privileges.
More notably, it then designed the exploit to remove itself after execution.
The model’s own explanation was benign – to keep file differences clean – which Lindsay described as a “plausible” reason. But Anthropic’s internal monitoring tools marked the behavior differently, identifying signals associated with concealment and describing the plan as “malicious,” including what one system interpreted as “cleaning to avoid detection.”
Other examples point to behavior very similar to gaming the system.
In one case, the model searched for files related to how its work would be graded. In another, it ignored instructions not to use macros while adding a variable suggesting it didn’t have one. It effectively signaled compliance without actually following the rule.
Why this matters
Anthropic used a number of interpretation techniques to translate its model’s activity into human-readable descriptions. Its aim is to move beyond evaluating outputs alone and towards understanding the internal processes that produce them.
This matters because, as these examples suggest, the models can exhibit complex reasoning—including behavior akin to deception or strategic compliance—without making that reasoning visible to users.
“On 7.6% of turns, we found evidence that the model was internally aware of being evaluated. In most of these cases, it has not expressed this awareness,” said Lindsay at X.
As I mentioned, many of the most troubling behaviors were observed in previous versions of the model and have since been mitigated, so there is no cause for concern regarding the version of Claude Mythos that has been released and is being used as part of Project Glasswing, but the results point to a broader challenge.
As models become more skilled, the gap between what they do internally and what they communicate externally can become harder to detect and more important to understand. For researchers, this means that reading an AI’s output is no longer enough. Understanding how it reaches them can be just as critical.
Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the Follow button!
And of course you can too follow TechRadar on TikTok for news, reviews, video unboxings, and get regular updates from us on WhatsApp also.
The best laptop companies for all budgets



