Cookie Consent
Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.
Read our Privacy Policy
Back

Regression Testing for Machine Learning: How to Do It Right

In this blog series, we’ll investigate how we can better test machine learning applications. In the first post, we’ll look at what we mean by ML testing, what an ML bug is, and where they occur, as well as introduce the first technique for your ML testing repertoire: regression testing.

Lakera Team
October 20, 2023
Last updated: 
March 27, 2025
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

In-context learning

As users increasingly rely on Large Language Models (LLMs) to accomplish their daily tasks, their concerns about the potential leakage of private data by these models have surged.

[Provide the input text here]

[Provide the input text here]

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Lorem ipsum dolor sit amet, line first
line second
line third

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic Title italicTitle italicTitle italicTitle italicTitle italicTitle italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Machine learning systems are only as reliable as the tests that validate them. Now that we’ve discussed data bugs, let’s shift focus to testing the behavior that emerges from that data.

In this post, we explore one of the most essential techniques in the ML testing toolkit: regression testing. It’s a practical way to improve reliability and maintain consistent performance as your models evolve.

Hide table of contents
Show table of contents

Regression testing isn’t enough for GenAI. Add dynamic testing and red teaming with Lakera Red.

Explore Lakera Red

Explore Lakera Red

The Lakera team has accelerated Dropbox’s GenAI journey.

“Dropbox uses Lakera Guard as a security solution to help safeguard our LLM-powered applications, secure and protect user data, and uphold the reliability and trustworthiness of our intelligent features.”

What Is Regression Testing?

In traditional software development, regression testing refers to:

“…re-running functional and non-functional tests to ensure that previously developed and tested software still performs after a change.” [1]

Let’s say you find a bug, fix it, and want to make sure it doesn’t return in future versions. The solution? Add a test for it. That way, if the bug ever reappears, the test will catch it immediately. That’s the essence of regression testing.

Why Regression Testing Matters in Machine Learning

In machine learning, bugs can reappear after something as routine as retraining your model. This is especially likely when your datasets are constantly evolving.

ML regression testing helps you catch these issues. It ensures that your model keeps meeting baseline performance requirements, even as data and parameters change.

A Simple Way to Get Started

Each time your ML system fails on a tricky input, add that example to a “difficult cases” dataset. Use it as a regression test set that becomes part of your testing pipeline. Over time, you’ll build a valuable resource to track whether performance on known weak spots is improving—or breaking again.

Real-World Example: Olympic Integrity

Imagine you’ve built a computer vision system to detect whether runners stay in their lane during races.

The system performs well in cloudy conditions. But on sunny days, it misinterprets a runner’s shadow as the runner stepping out of bounds, triggering a false disqualification alert. That’s an ML bug.

Here’s how regression testing can help:

  1. Collect similar images where shadows confuse the model.
  2. Add them to a dedicated regression dataset.
  3. Retrain the model with improved data.
  4. Regularly test your model on both the standard test set and the regression set.

By doing this, you can catch regressions early and prevent the same bug from recurring in future updates.

Proactive ML Testing with Regression Sets

Regression testing doesn’t have to be reactive. It’s also a great proactive strategy for monitoring ML performance over time—especially in real-world deployments.

Say you’re deploying your model across multiple customer sites. How do you make sure it works equally well everywhere?

By building targeted regression datasets for key scenarios (e.g., different lighting conditions, locations, or user groups), you can track how performance holds up under each context.

Let’s look at how Tesla approaches this.

Regression Testing in Practice: The Tesla Approach

In 2020, Andrej Karpathy, Tesla’s Director of AI, shared how the company uses large-scale regression testing to validate its autopilot system [2].

Tesla has developed an advanced testing infrastructure that can:

  • Automatically create test sets for specific scenarios.
  • Mine edge cases from fleet-collected data.
  • Continuously evaluate system behavior at scale.

They don’t just react to bugs—they design regression sets to proactively stress-test the system.

You Don’t Need to Be Tesla

You can apply similar principles on a much smaller scale.

For example, in the Olympic runner use case, you could create mini regression datasets for:

  • Male vs. female athletes.
  • Red vs. blue running tracks.
  • Bright vs. cloudy lighting conditions.

Tracking model performance across these subsets will give you ongoing insights—and confidence—into how well your system generalizes.

Final Thoughts

Regression testing in machine learning is a low-effort, high-impact strategy to build more trustworthy models. It helps ensure that progress doesn’t come at the cost of stability.

Start small, stay consistent, and watch your model’s reliability improve over time.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Unlock Free AI Security Guide.

Discover risks and solutions with the Lakera LLM Security Playbook.

Download Free

Explore Prompt Injection Attacks.

Learn LLM security, attack strategies, and protection tools. Includes bonus datasets.

Unlock Free Guide

Learn AI Security Basics.

Join our 10-lesson course on core concepts and issues in AI security.

Enroll Now

Evaluate LLM Security Solutions.

Use our checklist to evaluate and select the best LLM security tools for your enterprise.

Download Free

Uncover LLM Vulnerabilities.

Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.

Download Free

The CISO's Guide to AI Security

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Download Free

Explore AI Regulations.

Compare the EU AI Act and the White House’s AI Bill of Rights.

Download Free
Lakera Team

GenAI Security Preparedness
Report 2024

Get the first-of-its-kind report on how organizations are preparing for GenAI-specific threats.

Free Download
Read LLM Security Playbook

Learn about the most common LLM threats and how to prevent them.

Download

Explore AI Regulations.

Compare the EU AI Act and the White House’s AI Bill of Rights.

Understand AI Security Basics.

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Uncover LLM Vulnerabilities.

Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.

Optimize LLM Security Solutions.

Use our checklist to evaluate and select the best LLM security tools for your enterprise.

Master Prompt Injection Attacks.

Discover risks and solutions with the Lakera LLM Security Playbook.

Unlock Free AI Security Guide.

Discover risks and solutions with the Lakera LLM Security Playbook.

You might be interested
No items found.
Activate
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.