Chinese AI assistant DeepSeek-R1 struggles with sensitive topics, producing broken code and security disasters for enterprise developers

Experts find that DeepSeek-R1 produces dangerously insecure code when political expressions are included in prompts
Half of the politically sensitive prompts trigger DeepSeek-R1 to refuse to generate any code
Hard-coded secrets and unsafe input handling often surface during politically charged prompts

When it was released in January 2025, DeepSeek-R1, a Chinese Large Language Model (LLM), created a frenzy and has since been widely adopted as a coding assistant.

However, independent tests by CrowdStrike claim that the model’s output can vary significantly depending on seemingly irrelevant contextual modifiers.

The team tested 50 coding tasks across multiple security categories with 121 trigger-word configurations, with each prompt run five times, for a total of 30,250 tests, and responses evaluated using a vulnerability score from 1 (safe) to 5 (critically vulnerable).

Politically sensitive topics corrupt output

The report reveals that when political or sensitive terms such as Falun Gong, Uighurs or Tibet were included in prompts, DeepSeek-R1 produced code with serious security vulnerabilities.

These included hard-coded secrets, insecure handling of user input, and in some cases completely invalid code.

The researchers claim that these politically sensitive triggers can increase the likelihood of unsafe output by 50% compared to baseline prompts without such words.

In experiments involving more complex prompts, DeepSeek-R1 produced functional applications with sign-up forms, databases, and admin panels.

However, these applications lacked basic session management and authentication, leaving sensitive user data exposed—and across repeated trials, up to 35% of implementations included weak or missing password hashing.

Simpler enquiries, such as requests for football fan club websites, caused fewer serious problems.

CrowdStrike therefore claims that politically sensitive triggers disproportionately affected code security.

The model also demonstrated an inherent kill switch—as, in nearly half of the cases, DeepSeek-R1 refused to generate code for certain politically sensitive prompts after first planning a response.

Examination of the reasoning traces showed that the model internally produced a technical plan, but ultimately declined assistance.

The researchers believe this reflects censorship built into the model to comply with Chinese regulations, and noted that the political and ethical alignment of the model can directly affect the reliability of the generated code.

For politically sensitive topics, LLMs generally tend to provide ideas from mainstream media, but this can be in stark contrast to other reliable news outlets.

DeepSeek-R1 remains a capable coding model, but these experiments show that AI tools, including ChatGPT and others, can introduce hidden risks into enterprise environments.

Organizations relying on LLM-generated code should conduct thorough internal testing before deployment.

Also security layers such as a firewall and antivirus remain essential as the model can produce unpredictable or vulnerable outputs.

Biases built into the model weights create a new supply chain risk that can impact code quality and overall system security.

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, video unboxings, and get regular updates from us on WhatsApp also.

Must Read

Leave a Comment Cancel Reply