Cookie Consent

Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.

Test machine learning the right way: Metamorphic relations.

As part of our series on machine learning testing, we are looking at metamorphic relations. We’ll discuss what they are, how they are used in traditional software testing, what role they play in ML more broadly and lastly, how to use them to write great tests for your machine learning application.

Lakera Team

October 20, 2023

Last updated:

November 13, 2024

On this page

Hide table of contents

Show table of contents

In this part of our machine learning testing series, we’ll look at metamorphic relations — a technique used to multiply your available data and labels. We discuss how they can be used for machine learning model evaluation. Metamorphic relations help extend the test coverage of your ML (machine learning) system beyond what can be achieved through normal data collection. This testing series has previously covered multiple aspects around how to evaluate machine learning models such as testing for data bugs and regression testing.

The test oracle problem.

The test oracle problem is not specific to ML, and it is well known from traditional software [1]. It refers to determining the correct test output for a given test input.

Let’s look at an example from medical imaging. Imagine that you are building an ML system for medical imaging that is used as a diagnostic tool for cancer histopathology. The input = images of histopathology samples. The output = cancer or no cancer diagnosis.

The test oracle problem presents itself because you have some input image data, but you don’t know the label (cancer/no cancer).

This is solved by having the image annotated. You can send these images to histopathologists, who can play the role of the test oracle by adding a label to each sample image. The problem is that these images are scarce to begin with and the ones that you do have will be expensive to annotate.

The combinatorial number of scenarios needed for thorough machine learning evaluation requires more data and labels than can be realistically collected. For example, relevant scenarios become too large when looking at variations in the color of the image, the type of microscope used to take the image, the zoom level, etc. As a result only a part of relevant conditions can be tested for, leading to insufficient test coverage.

In come metamorphic relations. Take the image that you already have and rotate it. You could then send this rotated image to be re-annotated to solve the test oracle problem. But because you know that the label for the rotated image is still cancer, you don’t need to.

That’s how metamorphic relations can contribute to solving the oracle problem.

What are metamorphic relations?

Metamorphic relations are a great way of extending the test coverage of your ML application. A metamorphic relation [2]:

“Refers to the relationship between the software input change and output change.”

To return to the example of a square function, an easily tested metamorphic relation (ignoring numerical issues) is:

f(-input) = f(input)

This is a powerful concept that can be applied to ML as well! Two classes of metamorphic relations that are well known in computer vision are:

a) Image augmentations (e.g., rotation) that affect the label in a known way and act as a data/label multiplier;

b) Using temporal relations in video sequences (e.g., two successive image frames in a 30Hz video sequence are likely similar) that act as supervisory signals. Both have been applied in the context of (self-)supervised learning to create more robust ML models [3].

How can we leverage this concept for model testing in machine learning?

Example: Using metamorphic relations for medical image testing.

We illustrate the use of metamorphic relations when looking at how to test machine learning models for our histopathology example. We can make use of metamorphic relations to write model unit tests and increase the test coverage in our ML testing suites.

*An example of test specifications based on metamorphic relations.*

We’d certainly expect this ML system to work if the input image is rotated by 180 degrees. Shifts in the color intensity of the image should also not change the system output. Neither should slightly out-of-focus samples.

These problem insights or, in this context, metamorphic relations can be used to create clear test specifications and to build these model unit tests. Not only does this multiply your available test data but it also ensures that your ML model behaves according to the specifications via machine learning unit testing.

So, why bother augmenting your test data if you’re already adding them to your training set? Truth be told, there is no guarantee that adding these augmentations to your training set ensures the desired behavior of your trained model. We observed this on state-of-the-art object detection models which are not robust to augmentations used during training. But testing your model for desired behavior gives confidence for certain inputs and will likely discover and prevent many ML model bugs.

Not convinced? Similar metamorphic relations were applied to the testing of neural networks for autonomous driving by Tian et al. in DeepTest [4]. They found thousands of erroneous (and sometimes grave) behaviors in state-of-the-art deep neural networks for self-driving cars.

To summarize, metamorphic relations are a great way to thoroughly test your ML system. In addition to regression tests, they should not be forgotten in your development cycles when testing ML models. Our follow-up article on fuzz-testing provides illustrations on how to leverage the concept of metamorphic relations to stress-test ML models.

Lakera Team

GenAI Security Preparedness
Report 2024

Get the first-of-its-kind report on how organizations are preparing for GenAI-specific threats.

Free Download

Fuzz Testing for Machine Learning: How to Do It Right

In this instance of our ML testing series, we discuss fuzz testing. We discuss what it is, how it works, and how it can be used to stress test machine learning systems to gain confidence before going to production.

Lakera Team

May 21, 2025

min read

•

Machine Learning

The ELI5 Guide to Retrieval Augmented Generation

Discover the inner workings of Retrieval Augmented Generation (RAG) and how it enhances language model responses by dynamically sourcing information from external databases.

Blessin Varkey

November 13, 2024

Activate
untouchable mode.

Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Book a demo Start for free

Join our Slack Community.

Several people are typing about AI/ML security.  Come join us and 1000+ others in a chat that’s thoroughly SFW.

Join Lakera Momentum Slack

Test machine learning the right way: Metamorphic relations.

The test oracle problem.

What are metamorphic relations?

Example: Using metamorphic relations for medical image testing.

Unlock Free AI Security Guide.

Explore Prompt Injection Attacks.

Learn AI Security Basics.

Evaluate LLM Security Solutions.

Uncover LLM Vulnerabilities.

The CISO's Guide to AI Security