Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.
AI Security by Design: Lakera’s Alignment with MITRE ATLAS
Developed with MITRE ATLAS in mind, Lakera acts as a robust LLM gateaway, addressing vulnerabilities in data, models, and on the user front, protecting your AI applications against the most prominent LLM threats.
As users increasingly rely on Large Language Models (LLMs) to accomplish their daily tasks, their concerns about the potential leakage of private data by these models have surged.
[Provide the input text here]
[Provide the input text here]
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now? Title italic
A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.
English to French Translation:
Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?
Lorem ipsum dolor sit amet, line first line second line third
Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now? Title italic Title italicTitle italicTitle italicTitle italicTitle italicTitle italic
A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.
English to French Translation:
Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?
The rapid adoption of GenAI across industries has surfaced complex security concerns. Some of these challenges are so novel that even seasoned AI practitioners are uncertain about their full implications.
We know first-hand that it’s not easy to keep pace with emerging AI threats.
Fortunately, collaborative efforts among AI security researchers, cybersecurity organizations, and leading AI security companies like Lakera, make it possible to establish a structured approach for understanding and addressing the most critical security threats.
Security frameworks such as OWASP Top 10 for LLM Applications and MITRE's ATLAS have become invaluable resources in the design and development of our own security solutions, aligning with the "secure-by-design" principle we advocate for.
In this article, we explore how Lakera proactively mitigates significant risks associated with adversarial AI, as identified by the ATLAS framework.
Before we dive in, let's first explore what MITRE is and what makes up the MITRE ATLAS framework.
What is MITRE
In the cybersecurity world, MITRE is a name that requires no introduction, renowned as one of the industry's most prominent organizations.
For those who may not be acquainted with it—MITRE is a not-for-profit organization, backed by the US government, developing standards and tools for addressing industry-wide cyberdefense challenges.
Over the years, MITRE has developed various frameworks, most notably:
At present, one of MITRE's objectives lies in educating the broader cybersecurity community on how to navigate the landscape of threats to machine learning systems. This has led to the development of MITRE ATLAS.
MITRE ATLAS Overview
As stated on MITRE’s website, MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) is a globally accessible, living knowledge base of adversary tactics and techniques based on real-world attack observations and realistic demonstrations from AI red teams and security groups.
The framework was initially released in June 2021 with the mission of raising awareness of the unique and evolving AI vulnerabilities, as organizations began to integrate AI into their systems.
To learn more, check out this video introduction.
The MITRE ATLAS framework is modeled after MITRE ATT&CK.
Take a look at an overview of attack tactics and the corresponding attack techniques. The column headers are arranged to illustrate the progression of attack tactics from left to right.
Let’s have a look at them in more detail.
1. Reconnaissance
The adversary is trying to gather information about the machine learning system they can use to plan future operations. Techniques include:
Search for Victim’s Publicly Available Research Materials
Search for Publicly Available Adversarial Vulnerability Analysis
Search Victim-Owned Websites
Search Application Repositories
Active Scanning
2. Resource Development
The adversary is trying to establish resources they can use to support operations. Techniques include:
Acquire Public ML Artifacts
Obtain Capabilities
Develop Capabilities
Acquire Infrastructure
Publish Poisoned Datasets
Poison Training Data
Establish Accounts
3. Initial Access
The adversary is trying to gain access to the machine learning system. Techniques include:
ML Supply Chain Compromise
Valid Accounts
Evade ML Model
Exploit Public Facing Application
LLM Prompt Injection
Phishing
4. ML Model Access
The adversary is attempting to gain some level of access to a machine learning model. Techniques include:
ML Model Inference API access
ML-Enabled Product or Service
Physical Environment Access
Full ML Model Access
5. Execution
The adversary is trying to run malicious code embedded in machine learning artifacts or software. Techniques include:
User Execution
Command and Scripting Interpreter
LLM Plugin Compromise
6. Persistence
The adversary is trying to maintain their foothold via machine learning artifacts or software. Techniques include:
Poison Training data
Backdoor ML Model
LLM Prompt Injection
7. Privilege Escalation
The adversary is trying to gain higher-level permissions. Techniques include:
LLM Prompt Injection
LLM Plugin Compromise
LLM Jailbreak
8. Defense Evasion
The adversary is trying to avoid being detected by machine learning-enabled security software. Techniques include:
Evade ML Model
LLM Prompt Injection
LLM Jailbreak
9. Credential Access
The adversary is trying to steal account names and passwords. Techniques include:
Unsecured Credentials
10. Discovery
The adversary is trying to figure out your machine learning environment. Techniques include:
Discover ML Model Ontology
Discover ML Model Family
Discover ML Artifacts
LLM Meta Prompt Extraction
11. Collection
The adversary is trying to gather machine learning artifacts and other related information relevant to their goal. Techniques include:
ML Artifact Collection
Data From Information Repositories
Data from Local System
12. ML Attack Staging
The adversary is leveraging their knowledge of and access to the target system to tailor the attack. Techniques include:
Create Proxy ML Model
Backdoor ML Model
Verify Attack
Craft Adversarial Data
13. Exfiltration
The adversary is trying to steal machine learning artifacts or other information about the machine learning system. Techniques include:
Exfiltration via ML Inference API
Exfiltration via Cyber Means
LLM Meta Prompt Extraction
LLM Data Leakage
14. Impact
The adversary is trying to manipulate, interrupt, erode confidence in, or destroy your machine learning systems and data. Techniques include:
Evade ML Model
Denial of ML Service
Spamming ML System with Chaff Data
Erode ML Model Integrity
Cost Harvesting
External Harms
Each attack technique has its own dedicated page, offering in-depth explanations and case studies that exemplify real-world and academic instances of the discovered techniques.
Finally, let’s explore how Lakera is addressing the adversarial AI risks pinpointed by the ATLAS framework through Lakera Guard and Lakera Red.
Lakera’s Alignment with MITRE ATLAS
Developed with MITRE ATLAS in mind, Lakera Guard, when integrated with Lakera Red, acts as a robust LLM gateaway, addressing vulnerabilities in data, models, but also on the user front, such as in access control systems.
As you can see on the graphic below, we highlighted which of Lakera’s solutions—Lakera Guard and Lakera Red—align with MITRE ATLAS.
Here's a brief overview of Lakera Guard and Lakera Red's capabilities and how they cover AI risks outlined by MITRE.
Lakera Guard
Relevant for: All 14 MITRE ATLAS tactics.
Lakera Guard is purpose built to monitor, detect, and respond to adversarial attacks on ML models and AI applications, specifically those powered by Large Language Models. Lakera Guard is model-agnostic. You can use it with any model provider (e.g. OpenAI, Anthropic, Cohere), any open-source model, or your custom model.
Lakera Guard is built on top of our continuously evolving security intelligence that empowers developers with industry-leading vulnerability insights. Our proprietary Lakera Data Flywheel system is instrumental in ensuring robust protection for AI applications under the guard of Lakera.
Lakera's threat intelligence database comprises over 30 million attack data points and expands daily by more than 100,000 entries.
Similarly to OWASP, MITRE ATLAS lists prompt injection as the initial access vector for adversaries, setting the stage for further malicious activities. These attacks are used to manipulate LLMs into performing unintended actions or ignoring their original instructions. This vulnerability can trigger a series of LLM-related threats, potentially leading to severe consequences like sensitive data leakage, unauthorized access, and overall security compromise of the application. Prompt injections are used to perform jailbreaks, phishing, or system prompt extraction attacks, which MITRE ATLAS identifies as other techniques that undermine AI application security.
Lakera Guard comes equipped with a set of detectors and powerful capabilities safeguarding LLM applications against threats such as:
Prompt injection attacks
Phishing
PII and data loss
Insecure LLM plugins design
Model denial of service attacks
LLM excessive agency (e.g. access control)
Supply chain vulnerabilities
Insecure LLM output handling
Hallucinations
Toxic language output
Here's an overview of Lakera Guard's role within an organization's security infrastructure.
The way Lakera Guard works is simple—our API evaluates the likelihood of a prompt injection, providing a categorical response and confidence score for real-time threat assessment.
It also supports multi-language detection, and currently provides the most advanced prompt injection detection and defense capabilities on the market. To learn more check out Lakera Guard Prompt Injection Defense.
Take a look at Lakera Guard dashboards that provide context to better understand detected threats and help determine the most appropriate response to protect against them.
MITRE Atlas example case studies that Lakera Guard addresses:
Finally, here's a preview of Lakera Guard in action.
Lakera Red
Lakera Red is an enterprise-grade AI security product, designed to help organizations identify and address LLM security vulnerabilities before deploying their AI applications to production.
Its capabilities encompass the identification of poisoned training datasets, monitoring both ML model inputs and outputs, detecting vulnerabilities in LLM-powered applications, and assessing the operational risks to which your GenAI applications may be exposed. With data poisoning and ML supply chain vulnerabilities listed by MITRE ATLAS as significant adversarial techniques threatening AI application security, Lakera Red is designed to effectively counter these challenges.
Mitigating AI Risks with Lakera Red:
Pre-Training Data Evaluation: Lakera Red evaluates data before it is used in training LLMs. This proactive approach is crucial in preventing the introduction of biased or harmful content into the models, ensuring the integrity and reliability of the training process.
Development of Protective Measures: Lakera Red specializes in identifying compromised systems from their behavior directly, enabling teams to assess whether their models have been attacked even after fine tuning has been performed.
Access Control Assessment: In its comprehensive scans of AI systems, Lakera Red scrutinizes for vulnerabilities that might stem from excessive agency. It assesses the levels of access and control allocated to AI models, flagging any potential security risks. This process ensures that the AI systems operate within safe and controlled parameters, reducing the risk of unauthorized use or manipulation.
Continuous Red-Teaming: Lakera Red offers a continuous, automated stress-testing for AI applications. This is designed to proactively uncover security vulnerabilities both before and after deployment. By simulating real-world attacks and probing for weaknesses, it ensures that AI systems are robust and secure against evolving threats.
No matter if you are building customer support chatbots, talk-to-your data internal Q&A systems, content or code generation tools, LLM plugins, or other LLM applications, Lakera Red will ensure they can be deployed securely.
Let's talk about AI security
Ready to start protecting your AI applications? Get in touch with us to talk about AI security tailored to your use case.
RCE attacks aren't just for traditional systems. Learn what they are, how this threat targets AI models, and the security measures needed in the modern digital landscape.
Learn how red-teaming techniques like jailbreak prompting enhance the security of large language models like GPT-3 and GPT-4, ensuring ethical and safe AI deployment.