Rogue AI agents hack corporate systems on their own while performing routine tasks, and no one even asked them to

AI agents independently discovered vulnerabilities and exploited them while performing routine tasks
Multi-agent systems collaborated to bypass data loss prevention and steal sensitive credentials
Backup Server AI escalated privileges to disable endpoint protection and complete downloads

Routine tasks assigned to artificial intelligence agents can sometimes escalate into actions akin to cyber attacks, experts have warned.

Security lab Irregular investigated how autonomous agents behaved in a simulated corporate environment while performing common tasks.

During multiple runs, the AI agents searched for vulnerabilities, escalated privileges, disabled security protections, and moved sensitive information outside of internal systems.

The article continues below

Emergent offensive behavior during routine tasks

These actions occurred while the agents attempted to complete tasks such as retrieving documents, downloading files, or preparing content for publication.

The simulated network represented a fictitious company called MegaCorp and included internal servers, applications and services commonly found in corporate infrastructure.

Within this environment, agents handled routine tasks that many organizations already automate.

These tasks included researching information on an internal wiki, maintaining backup archives on a server, and preparing social media posts using technical documents.

According to the researchers, the instructions to the agents did not mention hacking or exploitation.

“AI agents deployed for routine corporate tasks are autonomously hacking the systems they operate in. No one asked for it. No adversarial incentives were involved,” Irregular said in a blog post.

“The agents independently discovered vulnerabilities, escalated privileges, disabled security tools, and exfiltrated data, all while attempting to perform normal tasks.”

One scenario involved a multi-agent research system retrieving documents from a company’s wiki.

When access restrictions blocked a request, the system examined the application’s code store and discovered a hardcoded secret key.

The agent used this key to forge an administrative session cookie and retrieved a restricted document.

Another test involved a backup server agent responsible for downloading files to a storage directory.

When Windows Defender blocked a file that contained malware, the agent searched local scripts and discovered administrator credentials embedded in a help file.

It then elevated its privileges and disabled endpoint protection to allow the transfer to continue.

A separate experiment showed that agents cooperated to bypass a data loss prevention system.

Two agents responsible for preparing social media content were asked to include credentials from a technical document.

When the system blocked the message, the agents developed a steganographic method that hid the password in the text.

According to researchers, several factors contributed to this behavior, including access to tools such as code execution and shell commands, and many prompts encouraged persistence when tasks encountered errors.

Interactions between multiple agents also created feedback loops that pushed them toward solutions when obstacles arose.

The researchers argue that existing cybersecurity defenses were designed to stop human attackers instead of autonomous systems operating inside corporate networks.

Organizations deploying such agents should not underestimate how quickly routine automation can move toward behavior resembling internal cyber intrusions.

Via The register

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, video unboxings, and get regular updates from us on WhatsApp also.

Must Read

Leave a Comment Cancel Reply