Cookie Consent
Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.
Read our Privacy Policy
Back

Test machine learning the right way: Metamorphic relations.

As part of our series on machine learning testing, we are looking at metamorphic relations. We’ll discuss what they are, how they are used in traditional software testing, what role they play in ML more broadly and lastly, how to use them to write great tests for your machine learning application.

Lakera Team
November 13, 2024
July 20, 2021
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

In-context learning

As users increasingly rely on Large Language Models (LLMs) to accomplish their daily tasks, their concerns about the potential leakage of private data by these models have surged.

[Provide the input text here]

[Provide the input text here]

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Lorem ipsum dolor sit amet, line first
line second
line third

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic Title italicTitle italicTitle italicTitle italicTitle italicTitle italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Hide table of contents
Show table of contents

In this part of our machine learning testing series, we’ll look at metamorphic relations — a technique used to multiply your available data and labels. We discuss how they can be used for machine learning model evaluation. Metamorphic relations help extend the test coverage of your ML (machine learning) system beyond what can be achieved through normal data collection. This testing series has previously covered multiple aspects around how to evaluate machine learning models such as testing for data bugs and regression testing.

The test oracle problem.

The test oracle problem is not specific to ML, and it is well known from traditional software [1]. It refers to determining the correct test output for a given test input.

Let’s look at an example from medical imaging. Imagine that you are building an ML system for medical imaging that is used as a diagnostic tool for cancer histopathology. The input = images of histopathology samples. The output = cancer or no cancer diagnosis.

The test oracle problem presents itself because you have some input image data, but you don’t know the label (cancer/no cancer).

This is solved by having the image annotated. You can send these images to histopathologists, who can play the role of the test oracle by adding a label to each sample image. The problem is that these images are scarce to begin with and the ones that you do have will be expensive to annotate.

The combinatorial number of scenarios needed for thorough machine learning evaluation requires more data and labels than can be realistically collected. For example, relevant scenarios become too large when looking at variations in the color of the image, the type of microscope used to take the image, the zoom level, etc. As a result only a part of relevant conditions can be tested for, leading to insufficient test coverage.

In come metamorphic relations. Take the image that you already have and rotate it. You could then send this rotated image to be re-annotated to solve the test oracle problem. But because you know that the label for the rotated image is still cancer, you don’t need to.

That’s how metamorphic relations can contribute to solving the oracle problem.

What are metamorphic relations?

Metamorphic relations are a great way of extending the test coverage of your ML application. A metamorphic relation [2]:

“Refers to the relationship between the software input change and output change.”

To return to the example of a square function, an easily tested metamorphic relation (ignoring numerical issues) is:

f(-input) = f(input)

This is a powerful concept that can be applied to ML as well! Two classes of metamorphic relations that are well known in computer vision are:

a) Image augmentations (e.g., rotation) that affect the label in a known way and act as a data/label multiplier;

b) Using temporal relations in video sequences (e.g., two successive image frames in a 30Hz video sequence are likely similar) that act as supervisory signals. Both have been applied in the context of (self-)supervised learning to create more robust ML models [3].

How can we leverage this concept for model testing in machine learning?

Example: Using metamorphic relations for medical image testing.

We illustrate the use of metamorphic relations when looking at how to test machine learning models for our histopathology example. We can make use of metamorphic relations to write model unit tests and increase the test coverage in our ML testing suites.

An example of test specifications based on metamorphic relations.

We’d certainly expect this ML system to work if the input image is rotated by 180 degrees. Shifts in the color intensity of the image should also not change the system output. Neither should slightly out-of-focus samples.

These problem insights or, in this context, metamorphic relations can be used to create clear test specifications and to build these model unit tests. Not only does this multiply your available test data but it also ensures that your ML model behaves according to the specifications via machine learning unit testing.

So, why bother augmenting your test data if you’re already adding them to your training set? Truth be told, there is no guarantee that adding these augmentations to your training set ensures the desired behavior of your trained model. We observed this on state-of-the-art object detection models which are not robust to augmentations used during training. But testing your model for desired behavior gives confidence for certain inputs and will likely discover and prevent many ML model bugs.

Not convinced? Similar metamorphic relations were applied to the testing of neural networks for autonomous driving by Tian et al. in DeepTest [4]. They found thousands of erroneous (and sometimes grave) behaviors in state-of-the-art deep neural networks for self-driving cars.

To summarize, metamorphic relations are a great way to thoroughly test your ML system. In addition to regression tests, they should not be forgotten in your development cycles when testing ML models. Our follow-up article on fuzz-testing provides illustrations on how to leverage the concept of metamorphic relations to stress-test ML models.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Unlock Free AI Security Guide.

Discover risks and solutions with the Lakera LLM Security Playbook.

Download Free

Explore Prompt Injection Attacks.

Learn LLM security, attack strategies, and protection tools. Includes bonus datasets.

Unlock Free Guide

Learn AI Security Basics.

Join our 10-lesson course on core concepts and issues in AI security.

Enroll Now

Evaluate LLM Security Solutions.

Use our checklist to evaluate and select the best LLM security tools for your enterprise.

Download Free

Uncover LLM Vulnerabilities.

Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.

Download Free

The CISO's Guide to AI Security

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Download Free

Explore AI Regulations.

Compare the EU AI Act and the White House’s AI Bill of Rights.

Download Free
Lakera Team

GenAI Security Preparedness
Report 2024

Get the first-of-its-kind report on how organizations are preparing for GenAI-specific threats.

Free Download
Read LLM Security Playbook

Learn about the most common LLM threats and how to prevent them.

Download

Explore AI Regulations.

Compare the EU AI Act and the White House’s AI Bill of Rights.

Understand AI Security Basics.

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Uncover LLM Vulnerabilities.

Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.

Optimize LLM Security Solutions.

Use our checklist to evaluate and select the best LLM security tools for your enterprise.

Master Prompt Injection Attacks.

Discover risks and solutions with the Lakera LLM Security Playbook.

Unlock Free AI Security Guide.

Discover risks and solutions with the Lakera LLM Security Playbook.

You might be interested
2
min read
Machine Learning

Stress-test your models to avoid bad surprises.

Will my system work if image quality starts to drop significantly? If my system works at a given occlusion level, how much stronger can occlusion get before the system starts to underperform? I have faced such issues repeatedly in the past, all related to an overarching question: How robust is my model and when does it break?
Mateo Rojas-Carulla
November 13, 2024
min read
Machine Learning

Your validation set won’t tell you if a model generalizes. Here’s what will.

As we all know from machine learning 101, you should split your dataset into three parts: the training, validation, and test set. You train your models on the training set. You choose your hyperparameters by selecting the best model from the validation set. Finally, you look at your accuracy (F1 score, ROC curve...) on the test set. And voilà, you’ve just achieved XYZ% accuracy.
Václav Volhejn
November 13, 2024
Activate
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.