AI Red Teaming: Securing Unpredictable Systems
Discover the importance of AI red teaming in securing GenAI systems. Learn how Lakera is redefining red teaming to address the unique challenges of AI and LLMs.

Discover the importance of AI red teaming in securing GenAI systems. Learn how Lakera is redefining red teaming to address the unique challenges of AI and LLMs.
Download this guide to delve into the most common LLM security risks and ways to mitigate them.
In-context learning
As users increasingly rely on Large Language Models (LLMs) to accomplish their daily tasks, their concerns about the potential leakage of private data by these models have surged.
[Provide the input text here]
[Provide the input text here]
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?
Title italic
A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.
English to French Translation:
Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?
Lorem ipsum dolor sit amet, line first
line second
line third
Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?
Title italic Title italicTitle italicTitle italicTitle italicTitle italicTitle italic
A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.
English to French Translation:
Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?
As GenAI continues to expand its impact, it brings along a new set of challenges for cybersecurity. Traditional security methods, while effective in static environments, fall short when dealing with the dynamic, unpredictable nature of AI systems.
This is where red teaming comes into play.
Unlike conventional testing methods, AI red teaming is about finding vulnerabilities in non-deterministic systems, where attacks are constructed in plain language to leverage inherent LLM weaknesses.
At Lakera, we’ve been exploring what red teaming means for GenAI, why it’s critical, and how Lakera’s solutions are reshaping the approach to securing these systems.
To help uncover these answers, we’ve drawn on insights from David Haber, Lakera’s CEO with over a decade of experience in AI, and Matt Fiedler, our Product Manager, who has been instrumental in shaping Lakera Red. Their expertise sheds light on how red teaming is adapting to meet the challenges of GenAI.
Here’s what we’ve uncovered.
-db1-What is AI Red Teaming?
AI red teaming is the practice of stress-testing AI systems by simulating real-world adversarial attacks to uncover vulnerabilities. Unlike traditional security assessments, red teaming is not just about identifying known weaknesses but also about discovering unforeseen risks that emerge as AI evolves. Red teaming for GenAI simulates real-world adversarial behavior to uncover vulnerabilities, going beyond traditional penetration testing.
🔹 Attackers use natural language to bypass security, making static defenses ineffective.
🔹 Balancing security and usability is critical—overly strict defenses hinder functionality, while lenient ones expose risks.
🔹 Gandalf, Lakera’s red teaming platform, harnesses crowd-sourced attacks to uncover vulnerabilities.
🔹 Adaptive defenses are essential—GenAI security must evolve alongside emerging threats.
🔹 Looking ahead, securing AI will require adapting to agentic systems, multimodal inputs, and ever-changing attack techniques.-db1-
Red teaming is about simulating advanced adversarial attacks, probing systems for vulnerabilities, points of weaknesses, and hidden issues.
The interesting thing is that Red Teaming changes because the threat vector changed. No longer are threats hidden in code, but natural language
When it comes to GenAI, however, these distinctions blur. The attack surface expands significantly. In Matt’s words, “Every prompt, in a sense, is committing code to the application.” Attackers don’t need to breach backend systems to take control—they can manipulate the system through natural language alone. This makes the GenAI attack surface more accessible, but also far less predictable.
The variety of inputs further complicates things. GenAI systems aren’t just processing text anymore; they’re handling images, videos, and audio. This means the possible ways to exploit the system multiply exponentially. David emphasizes this shift: “It’s not just about accessing the system anymore—it’s about what you can get the system to do for you.”
At Lakera, we view red teaming in GenAI as a way to understand both the system’s vulnerabilities and the creativity of potential attackers. With every interaction, attackers are testing the limits of what these systems can do—and it’s up to us to stay one step ahead.
The shift to GenAI systems introduces challenges that traditional security approaches weren’t designed to handle. At Lakera, we’ve identified a few critical factors that set these systems apart and demand a rethinking of red teaming strategies.
First, there’s the sheer dynamism of GenAI. Both models and attackers evolve rapidly, making it difficult to establish fixed defenses. “The threat landscape is so dynamic that it literally is changing all the time,” as David points out. Updates to AI models—often silent—can remove some vulnerabilities while introducing entirely new ones, leaving security teams constantly playing catch-up.
Second, the interfaces to these systems are incredibly varied. GenAI systems don’t just process code or structured inputs—they interact through natural language, images, videos, and audio. This multimodal capability opens up a vast and often unpredictable input space for attackers to exploit.
Perhaps the most striking difference lies in how attackers engage with GenAI systems. Traditionally, gaining access to a system required breaching backend infrastructure or acquiring developer-level permissions. With GenAI, every user prompt is essentially an instruction—or, as Matt puts it, “committing code to the application.” This makes privilege escalation as simple as crafting a clever prompt capable of overriding the system instructions.
The accessibility of this attack vector compresses timelines. In traditional settings, attackers might spend weeks or months finding ways into a system. With GenAI, the same level of control can be achieved in minutes through a well-crafted prompt attack.
And then there’s the scale of creativity attackers bring to the table. GenAI has effectively turned everyone into a potential hacker. While not everyone is an effective hacker, the infinite combinations of natural language inputs make it challenging to predict all possible exploits. Matt sums it up: “Red teaming these GenAI applications is like searching an infinite landscape of natural language to find effective attacks.”
To address these challenges, red teaming for GenAI must be adaptive and forward-thinking. It’s not just about responding to known threats but preparing for the unknown—and that’s where tools like Gandalf prove invaluable.
At Lakera, our threat intelligence database, powered by Gandalf, has become a cornerstone of our approach to red teaming for GenAI. While Gandalf may appear as a fun and educational game on the surface, it is much more than that—it fuels our real-time understanding of how AI vulnerabilities emerge and evolve.
With millions of players worldwide contributing over 25 years of cumulative gameplay, Gandalf continuously feeds into our threat intelligence database, mapping the evolving attack landscape. David Haber describes its unique power: “Our threat intelligence database gives us a lens into how people are creatively exploiting GenAI systems through natural language. When a new research paper is published—let’s say on a novel type of prompt attack—it takes only minutes before someone tests it within our system.”
“Gandalf gives us a lens into how people are creatively exploiting GenAI systems through natural language.”
– David Haber
This constant feedback loop allows Lakera to stay ahead of emerging threats. As AI providers like OpenAI silently push updates to their models, our threat intelligence database, fueled by Gandalf, offers a dynamic snapshot of evolving vulnerabilities—and how attackers are adapting their methods. “We’ve observed attacks in close to 100 languages on the platform,” David adds, highlighting the global scale of these insights.
For Matt, this intelligence solves a challenge that has long plagued the red teaming process: how to effectively search an infinite landscape of natural language for impactful attacks. “It’s like looking for a needle in a haystack,” he explains. “Our threat intelligence database allows us to zero in on what works, which empowers our research team to refine defenses much faster and more effectively.”
But its utility doesn’t end with identifying vulnerabilities—it also informs strategies for adaptive defenses. By analyzing successful attacks and understanding the mechanisms behind them, Lakera develops cutting-edge techniques to secure GenAI systems in real time, ensuring our red teaming efforts remain proactive rather than reactive.
Securing GenAI applications comes with a unique set of challenges—chief among them is finding the right balance between security and usability. A defense system that’s too strict risks blocking legitimate user interactions, while one that’s too lenient leaves the application open to exploitation.
This trade-off is particularly pronounced in GenAI. As Matt explains, “Unlike traditional systems, where defenses often exist outside the application, in GenAI, they’re deeply intertwined with the application itself. For example, a system prompt designed to block harmful behavior might unintentionally degrade the quality of responses to legitimate queries.”
David echoes this sentiment, emphasizing that red teaming for GenAI is not just about identifying vulnerabilities—it’s about ensuring the system still performs effectively for users. “You’re not just defending against attackers—you’re ensuring the system still works well for users. That means measuring the impact of defenses on both fronts.”
Another challenge lies in the vast scope of natural language. As David points out, “With language, you’re dealing with a finite set of words but an infinite number of possible messages.” This makes it nearly impossible to anticipate every potential attack or interaction, underscoring the need for adaptive and iterative approaches.
This dynamic threat landscape also highlights the role of tools like Gandalf, Lakera’s interactive red teaming platform. As David explains, Gandalf serves as both a learning tool and a resource for gathering insights into real-world adversarial behavior. By enabling millions of players worldwide to simulate attacks on GenAI systems, Gandalf provides valuable data to inform cutting-edge security strategies.
For those looking to explore this further, Lakera has recently published a paper on Gandalf and adaptive defenses for GenAI, offering an in-depth look at these evolving challenges. You can read more about it here: “Gandalf the Red: Adaptive Security for LLMs.”
“You’re not just defending against attackers—you’re ensuring the system still works well for users.”
– David Haber
As GenAI continues to evolve, the field of red teaming must adapt to keep pace. At Lakera, we’re constantly looking ahead to anticipate the challenges and opportunities that lie on the horizon.
One major shift will be the move from conversational applications to agentic systems. Matt explains: “Right now, most red teaming focuses on conversational interfaces like chatbots or customer support tools. But as agents gain the ability to autonomously take actions—like writing to databases or executing code—the stakes will get much higher.”
These agentic systems, often described as the “next frontier” in AI, will demand a new level of sophistication in red teaming. David paints a vivid picture: “Imagine an agent with permissions to access critical systems or manage sensitive data. If such an agent were compromised, the damage could be far-reaching. It’s like moving from traditional networks to the cloud—it introduces a whole new set of vulnerabilities.”
“We’re just scratching the surface of what red teaming will look like in a truly multimodal world.”
– David Haber
Another exciting area of growth is multimodal red teaming. While text remains the dominant input for many GenAI systems, other modalities like images, audio, and video are becoming increasingly common. This raises important questions about how to test and secure systems that can process diverse types of data. “We’re just scratching the surface of what red teaming will look like in a truly multimodal world,” David notes.
Automation will also play a key role in the future of red teaming. Matt highlights the potential of using smarter algorithms to explore the infinite input space of natural language and beyond: “We need better ways to collect, analyze, and act on attack data. The tools we’re building today are just the beginning.”
Finally, community-driven initiatives like Gandalf will remain crucial. As David puts it, “The security landscape is too vast and too dynamic for any one organization to tackle alone. Platforms like Gandalf give us a way to crowdsource insights, distill the world’s creativity, and stay ahead of the curve.”
Looking ahead, it’s clear that the future of red teaming will be as dynamic and innovative as the systems it seeks to protect. At Lakera, we’re excited to be at the forefront of this journey—helping to shape the red teaming tools and standards that will define GenAI security for years to come.
Red teaming for GenAI isn’t just about identifying vulnerabilities—it’s about strengthening AI systems to be both resilient and secure while maintaining usability. At Lakera, we believe that true AI security requires a proactive approach, combining red teaming with real-time defenses to stay ahead of evolving threats.
The challenges in AI security will only grow, but so will the solutions. At Lakera, we’re committed to shaping the future of AI security with the tools, insights, and expertise needed to protect GenAI applications. If you’re ready to take the next step, reach out to us and learn how Lakera Red can help secure your AI systems.
“Red teaming and real-time defenses are two sides of the same coin—you need both to stay ahead.”
– David Haber
Download this guide to delve into the most common LLM security risks and ways to mitigate them.
Get the first-of-its-kind report on how organizations are preparing for GenAI-specific threats.
Compare the EU AI Act and the White House’s AI Bill of Rights.
Get Lakera's AI Security Guide for an overview of threats and protection strategies.
Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.
Use our checklist to evaluate and select the best LLM security tools for your enterprise.
Discover risks and solutions with the Lakera LLM Security Playbook.
Discover risks and solutions with the Lakera LLM Security Playbook.
Subscribe to our newsletter to get the recent updates on Lakera product and other news in the AI LLM world. Be sure you’re on track!
Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.
Several people are typing about AI/ML security. Come join us and 1000+ others in a chat that’s thoroughly SFW.