Cookie Consent
Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.
Read our Privacy Policy
Back

3 Strategies for Making Your ML Testing Mission-Critical Now

Testing machine learning systems is currently more of an art form than a standardized engineering practice. This is particularly problematic for machine learning in mission-critical contexts. This article summarizes three steps from our ML testing series that any development team can take when testing their ML systems.

Lakera Team
November 13, 2024
August 12, 2021
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

In-context learning

As users increasingly rely on Large Language Models (LLMs) to accomplish their daily tasks, their concerns about the potential leakage of private data by these models have surged.

[Provide the input text here]

[Provide the input text here]

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Lorem ipsum dolor sit amet, line first
line second
line third

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic Title italicTitle italicTitle italicTitle italicTitle italicTitle italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Hide table of contents
Show table of contents

Testing machine learning (ML) systems is currently more of an art form than a standardized engineering practice. This is particularly problematic for machine learning in mission-critical contexts. In these use cases, strict performance guarantees and regulatory compliance are a must. The best engineering teams at companies like Tesla have built sophisticated testing infrastructure to ensure the reliability of their ML systems. Now it is time to make effective and systematic ML testing a reality for the rest of us–including smaller engineering teams–as well.

This article summarizes three steps from our ML testing series that any development team can take when testing their ML systems:

Specify your operational domain 📝

  1. Systematic testing is most effective in the context of an operational domain. An operational domain describes the “specific conditions under which a given [...] automation system is designed to function” [1]. It is a more compact representation of the environment in which the system will operate. This can also include what data it will be exposed to and its user interactions. Specification of the operational domain can be used to detect data bugs. It is also the starting point for establishing reliability guarantees for the whole system.

Stress-test your system 🦾

  1. With the operational domain specified, we can check the robustness of the system within the relevant conditions. As with traditional software systems, bugs often appear when a problematic input is presented to the system. Fuzz testing is a popular strategy that looks for these problematic inputs by randomly generating data points both inside and outside of the operational domain. In combination with metamorphic relations, it becomes a powerful tool that developers can use to ensure that their system performs well enough within the operational domain and degrades gracefully when presented with more challenging inputs.

Ensure your system performs when it really matters ✋

  1. The machine learning development process is inherently complex and iterative. Regression sets provide an effective tool for ensuring that your system really gets better with every iteration. They are a simple tool that can be used not only retroactively (e.g., when a bug has been found and you want to make sure that it does not reoccur) but also proactively (e.g., for creating test sets to actively probe system behavior when performance matters).

Adopting these three strategies is the first step to making ML testing more systematic and effective. They often provide a high return on investment. This is the case especially for smaller engineering teams that don’t have Tesla’s resources but that require strict performance guarantees and want to move through product development quickly and efficiently.

Lakera’s validation engine, MLTest, finds critical performance vulnerabilities in computer vision systems before they enter operation. Built with industry-leading AI and safety expertise, MLTest makes reliability a no-brainer for entire development teams. Get in touch if you want to learn more!

[1] Definition from ISO 21448: Road vehicles — Safety of the intended functionality.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Unlock Free AI Security Guide.

Discover risks and solutions with the Lakera LLM Security Playbook.

Download Free

Explore Prompt Injection Attacks.

Learn LLM security, attack strategies, and protection tools. Includes bonus datasets.

Unlock Free Guide

Learn AI Security Basics.

Join our 10-lesson course on core concepts and issues in AI security.

Enroll Now

Evaluate LLM Security Solutions.

Use our checklist to evaluate and select the best LLM security tools for your enterprise.

Download Free

Uncover LLM Vulnerabilities.

Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.

Download Free

The CISO's Guide to AI Security

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Download Free

Explore AI Regulations.

Compare the EU AI Act and the White House’s AI Bill of Rights.

Download Free
Lakera Team

GenAI Security Preparedness
Report 2024

Get the first-of-its-kind report on how organizations are preparing for GenAI-specific threats.

Free Download
Read LLM Security Playbook

Learn about the most common LLM threats and how to prevent them.

Download

Explore AI Regulations.

Compare the EU AI Act and the White House’s AI Bill of Rights.

Understand AI Security Basics.

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Uncover LLM Vulnerabilities.

Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.

Optimize LLM Security Solutions.

Use our checklist to evaluate and select the best LLM security tools for your enterprise.

Master Prompt Injection Attacks.

Discover risks and solutions with the Lakera LLM Security Playbook.

Unlock Free AI Security Guide.

Discover risks and solutions with the Lakera LLM Security Playbook.

You might be interested
2
min read
Machine Learning

Stress-test your models to avoid bad surprises.

Will my system work if image quality starts to drop significantly? If my system works at a given occlusion level, how much stronger can occlusion get before the system starts to underperform? I have faced such issues repeatedly in the past, all related to an overarching question: How robust is my model and when does it break?
Mateo Rojas-Carulla
November 13, 2024
12
min read
Machine Learning

The ELI5 Guide to Retrieval Augmented Generation

Discover the inner workings of Retrieval Augmented Generation (RAG) and how it enhances language model responses by dynamically sourcing information from external databases.
Blessin Varkey
November 13, 2024
Activate
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.