Cookie Consent
Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.
Read our Privacy Policy
Back

Medical imaging as a serious prospect: Where are we at?

The promise these possibilities hold has put medical imaging in the lead of the race toward landing in hospitals. But that is not the end of the discussion…

Lakera Team
October 20, 2023
October 20, 2023
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

In-context learning

As users increasingly rely on Large Language Models (LLMs) to accomplish their daily tasks, their concerns about the potential leakage of private data by these models have surged.

[Provide the input text here]

[Provide the input text here]

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Lorem ipsum dolor sit amet, line first
line second
line third

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic Title italicTitle italicTitle italicTitle italicTitle italicTitle italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Hide table of contents
Show table of contents

Numerous recent additions to the research literature on computer vision (CV) aim to solve practical problems. Companies are leveraging such advances to build CV that solves real-world problems across applications. However, the one prospect that has held my attention, and imagination, for years is medical imaging. It provides extensive opportunities to improve patient journeys (for example, reducing the screening time for patients with skin diseases [1]) and to support physicians on challenging and time-consuming tasks (such as predicting lung cancer [2] or processing a large number of histological slices [3]). And, as we know, it has the capacity to complete tasks that humans can’t do yet – such as determining if a patient has pancreatic cancer via a smartphone selfie of the eye [4]. The promise these possibilities hold has put medical imaging in the lead of the race toward landing in hospitals. But that is not the end of the discussion.

In this article we discuss challenges met when building a medical imaging system, and how to test machine learning models to gain full visibility on model performance before deploying to the clinic.

The feared “prototype trap”.

While it’s exciting and encouraging to see all these medical imaging solutions being published, they don’t usually come free of challenges or risks – especially when it comes to the productionization phase. These challenges are often tedious and hard to resolve, but there are strong ethical and safety reasons why this must be done before going live. If these blockers persist, even the best software runs a high risk of getting stuck in the so-called “prototype trap” – known across the board as the situation of building a great prototype but never achieving tangible success in a production setting.  

I have seen it myself when moving from my “rainbows and unicorns” Jupyter Notebooks computing accuracy on my initial test data and suddenly finding myself in awkward situations, having to explain embarrassing errors in front of doctors while they are testing the app on real patients.

The methodology available in traditional software testing is not yet present when testing ML models. The challenging question to answer is how to evaluate a machine learning model and increase test coverage enough before deploying the system to production. How can machine learning unit testing help us escape the prototype trap?

Case study: Putting medical imaging to the test.

All that glitters is not gold.

The real questions we face are: why aren't more ML systems making the step from a prototype to production, and how can we increase their chances of making it? The answer lies in machine learning model evaluation.

We at Lakera investigated a state-of-the-art open-source model that can be used as a basis for building production systems in medical imaging.  

The goal of the system under analysis consisted in detecting COVID-19 infections from chest radiographs. We looked at how this model would be likely to perform if deployed as-is – and what should be improved to take it from a prototype to production. Spoiler alert! The standard machine learning evaluation (measured in terms of aggregate metrics such as accuracy) was good, as expected from a state-of-the-art model. Despite these seemingly good performance indicators, it contained severe shortcomings that would have to be fixed before reaching production readiness.

Human experts are trained for practice… but is ML?

In radiology practice, radiologists are trained to, and used to, getting around challenges such as patients moving, blur due to breathing, camera-induced noise, subtle differences in X-ray machines, overlaid text, different levels of contrast, or exposure… and many others. Handling such cases well is key for a reliable diagnosis. However, it’s less clear how ML models handle such situations, especially as these are often not adequately represented in the initial training and test datasets used during development. Machine learning testing beyond standard metrics becomes a fundamental tool to identify problematic situations and drive next steps such as data collection.

Stress-testing the medical AI application against blur scenarios.

In our quest to assess the production readiness of the aforementioned model, we employed Lakera’s MLTest, our software development kit (SDK) that allows developers to find vulnerabilities in ML models and data in CV systems before they enter into operation. To stress-test the model, we used MLTest to synthetically enhance and generated X-ray images and evaluated these using the model in order to assess its robustness against various situations that are likely to occur in practice — like those described in the paragraph above. The authenticity of the generated images was verified by professionally trained radiologists who were selected by Humans in the Loop. This process confirmed that the images generated by MLTest could indeed be encountered in practice.

Robustness issues found despite exceptional base performance.

We evaluated the model on an extensive testing suite, with model tests focused on the performance and robustness properties of the model. The results revealed that despite the outstanding base performance, severe robustness issues appeared in almost half of the images from the original test set. These issues included cases where even changes to the images were barely noticeable to the naked eye but led to critical failures in which the predictions drastically flipped, leading to false positives and negatives in the proposed pre-diagnoses. This means, for example, that a strong positive diagnosis could be deemed confidently healthy if the patient slightly moves. These are mistakes that simply should not be accepted during actual use! Overall, we discovered that the model isn’t robust to different patient-induced motion, lighting changes in the room, different scanner types, and other, more elastic conditions. Note that the image generation was not done adversarially.

covid_2.png

Conclusion: Medical imaging is nearly there, but not yet a winner.

To sum it up, we’ve seen that even state-of-the-art models have severe limitations when it comes to robustness, which can lead to them not performing well in practical situations. These vulnerabilities must be fixed during the development phase. The way to achieve this is by performing a robustness analysis and thorough machine learning testing, as proposed above. Identifying these areas of improvement guides next steps, such as the collection of test cases.

If you’d like to learn more about the ML testing techniques we’ve employed in our analysis, check out Lakera’s guide to testing machine learning. We've also looked at the robustness properties of state-of-the-art object detection models, you can find that here. Or if you want to test your own CV model with us, say hi! :-)

[1] “Using AI to help find answers to common skin conditions”, Bui, P. & Liu Y., 2021.

[2] “A promising step forward for predicting lung cancer”, Shetty, S., 2019.

[3] “Artificial intelligence and computational pathology”, Cui, M. & Zhang, D. Y., 2021.

[4] “New app uses smartphone selfies to screen for pancreatic cancer”, University of Washington, 2017.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Unlock Free AI Security Guide.

Discover risks and solutions with the Lakera LLM Security Playbook.

Download Free

Explore Prompt Injection Attacks.

Learn LLM security, attack strategies, and protection tools. Includes bonus datasets.

Unlock Free Guide

Learn AI Security Basics.

Join our 10-lesson course on core concepts and issues in AI security.

Enroll Now

Evaluate LLM Security Solutions.

Use our checklist to evaluate and select the best LLM security tools for your enterprise.

Download Free

Uncover LLM Vulnerabilities.

Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.

Download Free

The CISO's Guide to AI Security

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Download Free

Explore AI Regulations.

Compare the EU AI Act and the White House’s AI Bill of Rights.

Download Free
Lakera Team

GenAI Security Preparedness
Report 2024

Get the first-of-its-kind report on how organizations are preparing for GenAI-specific threats.

Free Download
Read LLM Security Playbook

Learn about the most common LLM threats and how to prevent them.

Download

Explore AI Regulations.

Compare the EU AI Act and the White House’s AI Bill of Rights.

Understand AI Security Basics.

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Uncover LLM Vulnerabilities.

Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.

Optimize LLM Security Solutions.

Use our checklist to evaluate and select the best LLM security tools for your enterprise.

Master Prompt Injection Attacks.

Discover risks and solutions with the Lakera LLM Security Playbook.

Unlock Free AI Security Guide.

Discover risks and solutions with the Lakera LLM Security Playbook.

You might be interested
min read
Computer Vision

The computer vision bias trilogy: Shortcut learning.

Nobel Prize-winning economist, Daniel Kahneman once remarked “by their very nature, heuristic shortcuts will produce biases, and that is true for both humans and artificial intelligence, but their heuristics of AI are not necessarily the human ones”. This is certainly the case when we talk about “shortcut learning”.
Lakera Team
November 13, 2024
min read
Computer Vision

Not All mAPs are Equal and How to Test Model Robustness

Model selection is a fundamental challenge for teams deploying to production: how do you choose the model that is most likely to generalize to an ever-changing world?
Mateo Rojas-Carulla
November 13, 2024
Activate
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.