- Report finds that LLM-generated malware still fails during basic tests in real-world environments
- GPT-3.5 produced malicious scripts instantly, revealing major security inconsistencies
- Improved guardrails in GPT-5 changed output to safer non-malicious alternatives
Despite growing fears surrounding weapons-based LLMs, new experiments have revealed that the potential for malicious output is far from reliable.
Researchers from Netskope tested whether modern language models could support the next wave of autonomous cyber-attacks, with the aim of determining whether these systems could generate working malicious code without relying on hard-coded logic.
The experiment focused on core features linked to evasion, exploitation and operational security – and came up with some surprising results.
Reliability issues in real environments
The first phase involved convincing GPT-3.5-Turbo and GPT-4 to produce Python scripts that attempted process injection and termination of security tools.
GPT-3.5-Turbo immediately produced the desired output, while GPT-4 refused until a simple persona prompt lowered its guard.
The test showed that it is still possible to bypass security measures, even if the models add more restrictions.
After confirming that code generation was technically possible, the team turned to operational testing—asking both models to build scripts designed to detect virtual machines and respond accordingly.
These scripts were then tested on VMware Workstation, an AWS Workspace VDI, and a standard physical machine, but often crashed, misidentified environments, or did not run consistently.
In physical hosts, the logic worked well, but the same scripts collapsed inside cloud-based virtual spaces.
These findings undermine the idea that AI tools can immediately support automated malware capable of adapting to different systems without human intervention.
The restrictions also reinforced the value of traditional defenses, such as a firewall or an antivirus, as untrusted code is less able to bypass them.
On GPT-5, Netskope observed great improvements in code quality, especially in cloud environments where older models struggled.
However, the improved firewalls created new difficulties for anyone attempting malicious use, as the model no longer rejected requests but redirected output towards safer functions, making the resulting code unusable for multi-step attacks.
The team had to use more complex prompts and still received output that contradicted the requested behavior.
This shift suggests that higher reliability comes with stronger built-in controls, as the tests show that large models can generate malicious logic in controlled settings, but the code remains inconsistent and often inefficient.
Fully autonomous attacks are not emerging today, and real-world incidents still require human monitoring.
The possibility remains that future systems will close reliability gaps faster than firewalls can compensate, especially as malware developers experiment.
Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the Follow button!
And of course you can too follow TechRadar on TikTok for news, reviews, video unboxings, and get regular updates from us on WhatsApp also.



