- Microsoft created an AI red team back in 2018 when it anticipated the rise of AI
- A red team represents the enemy; and adopts the conflicting persona.
- The team’s latest whitepaper hopes to address common vulnerabilities in AI systems and LLMs
Over the past seven years, Microsoft has addressed risks in artificial intelligence systems through its dedicated AI ‘red team’.
Established to anticipate and address the growing challenges posed by advanced AI systems, this team assumes the role of threat actors, ultimately aiming to identify vulnerabilities before they can be exploited in the real world.
Now, after years of work, Microsoft has released a whitepaper from the team that outlines some of its key findings from their work.
Findings from Microsoft’s Red Team White Paper
Over the years, the focus of Microsoft’s red teaming has expanded beyond traditional vulnerabilities to address new risks unique to AI, across Microsoft’s own Copilot as well as open source AI models.
The white paper emphasizes the importance of combining human expertise with automation to detect and mitigate risks effectively.
An important lesson is that the integration of generative AI into modern applications has not only expanded the cyber attack surface, but also brought unique challenges.
Techniques such as rapid injections exploit the models’ inability to distinguish between system-level instructions and user input, allowing attackers to manipulate results.
Meanwhile, traditional risks, such as outdated software dependencies or improper security engineering, remain significant, and Microsoft considers human expertise indispensable to address them.
The team found that an effective understanding of the risks surrounding automation often requires subject matter experts who can evaluate content in specialized fields such as medicine or cybersecurity.
Furthermore, it highlighted cultural competence and emotional intelligence as vital cyber security skills.
Microsoft also emphasized the need for continuous testing, updated practices and “break-fix” cycles, a process of identifying vulnerabilities and implementing fixes on top of additional testing.