Cookie Consent
Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.
Read our Privacy Policy
Back

The Beginner’s Guide to Hallucinations in Large Language Models

As LLMs gain traction across domains, hallucinations—distortions in LLM output—pose risks of misinformation and exposure of confidential data. Delve into the causes of hallucinations and explore best practices for their mitigation.

Deval Shah
November 13, 2024
August 23, 2023
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

In-context learning

As users increasingly rely on Large Language Models (LLMs) to accomplish their daily tasks, their concerns about the potential leakage of private data by these models have surged.

[Provide the input text here]

[Provide the input text here]

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Lorem ipsum dolor sit amet, line first
line second
line third

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic Title italicTitle italicTitle italicTitle italicTitle italicTitle italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Hide table of contents
Show table of contents

Large Language Models (LLMs) are at the forefront of technological discussions, known for their proficiency in processing and generating text that resembles human communication. They are transforming our interactions with technology. However, these models are not without their flaws. One significant issue is their tendency to produce "hallucinations," which affect their reliability.

Hallucinations in LLMs refer to the generation of content that is irrelevant, made-up, or inconsistent with the input data. This problem leads to incorrect information, challenging the trust placed in these models. Hallucinations are a critical obstacle in the development of LLMs, often arising from the training data's quality and the models' interpretative limits.

To use LLMs effectively, it's important to understand these hallucinations. Recognizing their limitations sharpens our insight into both the potential and the challenges of AI technologies. This article examines the causes of hallucinations, their impact, and ongoing efforts to curb them, aiming to improve the trustworthiness and functionality of LLMs for future applications.

Contents:

  • Understanding LLM hallucinations
  • Causes of hallucinations in LLMs 
  • Implications of hallucinations
  • Mitigating hallucinations in LLMs
  • Case studies and industry insights
  • Additional resources
  • Key takeaways

{{Advert}}

Understanding LLM Hallucinations

LLM hallucinations can be broken down into specific types, each with its unique characteristics and implications.

A clear classification helps developers and users alike identify, analyze, and address different hallucination scenarios. Such awareness is crucial for enhancing the models' accuracy and trustworthiness.

Taxonomy of Hallucinations in LLMs

Hallucinations in Large Language Models (LLMs) are categorized into factuality and faithfulness hallucinations.

Factuality Hallucination

This occurs when an LLM generates factually incorrect content. For instance, a model might claim that Charles Lindbergh was the first to walk on the moon, which is a factual error. This type of hallucination arises due to the model's limited contextual understanding and the inherent noise or errors in the training data, leading to responses that are not grounded in reality​.

Table 1 categorizes types of factuality hallucinations in LLMs with examples:

  1. Factual Inconsistency: The LLM incorrectly states Yuri Gagarin as the first person to land on the Moon (the correct answer is Neil Armstrong).
  2. Factual Fabrication: The LLM creates a fictitious narrative about unicorns in Atlantis, claiming they were documented to have existed around 10,000 BC and were associated with royalty despite no real-world evidence to support this claim.
Table 1: Examples of each category of Factuality hallucinations. Content marked in Red represents the hallucinatory output (Source)

Faithfulness Hallucination

These are instances where the model produces unfaithful content or is inconsistent with the provided source content.

For example, in the context of summarization, if an article states that the FDA approved the first Ebola vaccine in 2019, a faithfulness hallucination would include a summary claiming that the FDA rejected it (intrinsic hallucination) or that China started testing a COVID-19 vaccine (extrinsic hallucination), neither of which is mentioned in the original article​. (Source)

Source

Table 2 presents examples of faithfulness hallucinations in Large Language Models (LLMs), where the model output deviates from the user's input or the context provided. It categorizes these hallucinations into three types:

  1. Instruction Inconsistency: The LLM ignores the specific instructions given by the user. For example, instead of translating a question into Spanish as instructed, the model provides the answer in English.
  2. Context Inconsistency: The model output includes information not present in the provided context or contradicting it. An example is the LLM claiming the Nile originates from the mountains instead of the Great Lakes region, as mentioned in the user's input.
  3. Logical Inconsistency: The model's output contains a logical error despite starting correctly. For instance, the LLM performs an arithmetic operation incorrectly in a step-by-step math solution.
Table 2: Examples of each category of Faithfulness hallucinations. Content marked in Red represents the hallucinatory output, while content marked in Blue indicates user instruction or provided context that contradicts the LLM hallucination. (Source)

Broader Scope in LLMs

The scope of hallucinations in LLMs is broader than task-specific models due to the diverse range of applications and the complex nature of the models.

Intrinsic hallucinations often contradict the original text or external knowledge, while extrinsic hallucinations introduce new, unverifiable information. This phenomenon is observed across various generative tasks, from summarization to dialogue generation and question answering, each posing unique challenges in maintaining accuracy and consistency.

For instance, in open-domain dialogue generation, intrinsic hallucination might involve a chatbot confusing facts or names, while extrinsic hallucination may include the bot making unverifiable claims. Similarly, in generative question answering, intrinsic hallucinations can manifest as responses that don’t align with the source material, and extrinsic hallucinations are answers containing information not found in the original documents.​

Mitigation Strategies

Mitigating hallucinations in LLMs involves a multifaceted approach, including using scoring systems where human annotators rate the level of hallucination, compare generated content against baselines, and implement various product design strategies. 

Red teaming, where human evaluators rigorously test the model, is crucial in identifying and addressing hallucinations. Product-level recommendations like user editability, structured input/output, and user feedback mechanisms also effectively reduce the risk of hallucinations​.

Understanding the taxonomy of hallucinations in LLMs and their broader scope is essential for effectively deploying these models in various applications. Continuous efforts in mitigation and refinement are necessary to enhance the reliability and accuracy of LLM outputs. We will cover mitigation strategies in detail later in this article.

**💡 Pro Tip: To gain a deeper understanding of the different types of Large Language Models (LLMs) and their functionalities, explore the comprehensive guide at Large Language Models Guide.**

Causes of Hallucinations in LLMs

The causes of hallucinations in Large Language Models (LLMs) are multifaceted and stem from various aspects of their development and deployment.

Let’s dive deep into key causes of hallucinations in LLMs, including issues related to training data, architecture, and inference strategies.

Training Data Issues

A significant factor contributing to LLM hallucinations is the nature of the training data. LLMs, such as GPT, Falcon, and LlaMa, undergo extensive unsupervised training with large and diverse datasets from multiple origins.

Verifying this data's fairness, unbiasedness, and factual correctness is challenging. As these models learn to generate text, they may also pick up and replicate factual inaccuracies in the training data.

This leads to scenarios where the models cannot distinguish between truth and fiction and may generate outputs that deviate from facts or logical reasoning​.

LLMs trained on internet-sourced datasets may include biased or incorrect information. This misinformation can propagate into the model's outputs, as the model doesn't distinguish between accurate and inaccurate data.

For instance, Bard's error regarding the James Webb Space Telescope indicates how reliance on flawed data can lead to confident but incorrect assertions.

Figure 2: Example of training data issue occurred when Google's Bard was asked about discoveries from the James Webb Space Telescope (Source)

Architectural and Training Objectives

Hallucinations can also arise from model architecture flaws or suboptimal training objectives.

For instance, an architecture flaw or a misaligned training objective can lead the model to produce outputs that do not align with the intended use or expected performance.

This misalignment can result in the model generating content that is either nonsensical or factually incorrect​.

Inference Stage Challenges

During the inference stage, several factors can contribute to hallucinations.

These include defective decoding strategies and the inherent randomness in the sampling methods used by the model.

Additionally, issues like insufficient context attention or the softmax bottleneck in decoding can lead to outputs needing to be adequately grounded in the provided context or the training data.

Prompt Engineering

The way prompts are engineered can also influence the occurrence of hallucinations. 

The LLM might generate an incorrect or unrelated answer if a prompt lacks adequate context or is ambiguously worded.

Effective prompt engineering requires clarity and specificity to guide the model toward generating relevant and accurate responses​.

**💡 Pro Tip: Delve into effective prompt creation techniques to guide LLMs more accurately with the Prompt Engineering Guide.**

Stochastic Nature of Decoding Strategies

When generating text, LLMs use sampling strategies that can introduce randomness into the output.

For example, a high "temperature" setting can increase creativity and the risk of hallucination, as seen with language models generating entirely new plots or ideas. 

However, these stochastic methods can sometimes result in unexpected or nonsensical responses, reflecting the probabilistic nature of the model's decision-making process​​.

Ambiguity Handling

LLMs may generate hallucinated content when faced with unclear or imprecise input. 

In the absence of explicit information, models can fill gaps with invented data, as evidenced by the instance where ChatGPT created a false accusation against a professor due to an ambiguous prompt​.

Over-Optimization for Specific Objectives

Sometimes, LLMs are optimized for certain outcomes, such as longer outputs, which can lead to verbose and irrelevant responses.

This over-optimization can cause models to stray from providing concise, accurate information to producing more content that may include hallucinations.

Addressing these factors involves improving data quality, refining model architecture, enhancing decoding strategies, and better prompt engineering to reduce the frequency and impact of hallucinations in LLMs.

Stage

Sub-Stage

Type

Example Cause

Real-World Example

Data

Flawed Data Source

Misinformation and Biases

Training on incorrect data can lead to imitative falsehoods.

An LLM citing Thomas Edison as the sole inventor of the light bulb due to repeated misinformation in the training data.

   

Knowledge Boundary

Absence of up-to-date facts leads to limitations in specialized domains.

An LLM providing outdated information about recent Olympic hosts due to static knowledge from training data.

Training

Pre-training

Architecture Flaw

Unidirectional representation can limit contextual understanding

An LLM generating a one-sided narrative without considering all context, leading to partial or biased content.

   

Exposure Bias

Discrepancy between training and inference can cause cascading errors.

During inference, an LLM continuing to generate errors based on a single incorrect token it produced.

 

Alignment

Capability Misalignment

Aligning LLMs with capabilities beyond their training can lead to errors.

An LLM producing content in a specialized domain without the necessary data, resulting in fabricated facts.

   

Belief Misalignment

Outputs diverge from the LLM’s internal beliefs, leading to inaccuracies.

An LLM pandering to user opinions, generating content that it 'knows' is incorrect.

Inference

Decoding

Inherent Sampling Randomness

Randomness in token sampling can lead to less frequent but inaccurate outputs.

An LLM choosing low-probability tokens during generation, resulting in unexpected or irrelevant content.

   

Imperfect Decoding Representation

Over-reliance on partially generated content and softmax bottleneck.

An LLM focusing too much on recent tokens or failing to capture complex word relationships, leading to faithfulness errors.

 

Table 3: Summary of Hallucination Causes in Large Language Models Across Data, Training, and Inference Stages (Source)

Table 3 summarizes different types of nuanced causes of hallucinations in LLMs from outstanding research work by Lei Huang and the team. I highly recommend reading this paper as it covers hallucination causes in more detail with model output examples.

Implications of Hallucinations

LLM Hallucinations can be dangerous and impactful, and we have seen some disastrous outcomes recently. Recently, a huge blowout occurred when a New York attorney used ChatGPT for legal research.

An example of the real-world implications of hallucinations in Large Language Models (LLMs) is the legal case of Mata v. Avianca.

Here, a New York attorney used ChatGPT for legal research, leading to the inclusion of fabricated citations and quotes in a federal case. Steven Schwartz admitted he used ChatGPT to help research the brief in a client's personal injury case against Colombian airline Avianca and unknowingly included the false citations.

The case highlighted the direct consequences of relying on AI-generated content without verification and raised broader ethical and professional concerns within the legal field​. (Source)

Such incidents can significantly erode trust in AI technologies.

When LLMs produce hallucinations—outputs that are fabricated or inconsistent with facts—they risk creating misinformation.

The reliance on AI for tasks such as legal research or document review assumes the AI's outputs are reliable and trustworthy. However, when these outputs turn out to be hallucinations, it not only undermines the user's trust in the tool but can also lead to serious professional and legal repercussions, as was the case with the attorneys in Mata v. Avianca, who faced sanctions for their reliance on AI-generated, non-existent case law​

​The risk extends beyond individual cases to broader societal implications. 

Misinformation stemming from AI hallucinations can cascade, influencing decision-making processes and potentially leading to cyberattacks.

In the legal sphere, such misinformation can taint the integrity of judicial proceedings, as judges and other legal professionals rely on accurate and factual case law to make informed decisions.

The Mata v. Avianca case is a cautionary tale of the imperative need for rigorous verification of AI-generated content and the importance of maintaining ethical standards in professional conduct​.

Figure 3: Consequences of LLM Hallucination

Mitigating Hallucinations in Large Language Models

The research paper “A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation” addresses the issue of hallucinations and how they affect the reliability of Large Language Models.

The authors propose a novel approach to detect and mitigate these hallucinations during text generation actively.

The approach comprises several steps.

Initially, it involves identifying potential hallucinations by leveraging the model's logit output values. This step is critical because it determines the candidates for hallucination in the generated text.

The next phase involves a validation procedure to check the correctness of the identified hallucinations. If a hallucination is confirmed, the process includes a mitigation strategy to rectify the error without introducing new hallucinations, even in cases of incorrectly detected hallucinations (false positives).

The results of the study are promising. The detection technique achieved a recall of approximately 88%, meaning it could identify a high percentage of actual hallucinations.

The mitigation technique effectively mitigated 57.6% of these correctly detected hallucinations. An important aspect of this mitigation technique is that it does not introduce new hallucinations, a critical factor in maintaining the overall integrity of the model's output.

Furthermore, the approach was tested on GPT-3.5 (text-davinci-003) in an “article generation task” and showed significant effectiveness in reducing the rate of hallucinations from 47.5% to 14.5% on average.

This demonstrates the approach's efficacy in a practical scenario and its potential to enhance the reliability and trustworthiness of large language models, which is vital for their broader adoption in real-world applications.

**💡 Pro Tip: Learn more about enhancing the safety and efficacy of LLM applications with insights from OWASP Top 10 for Large Language Model Applications Guide.**

Figure 4: Illustration of our proposed approach for addressing LLMs’ hallucination problem (Source)

Exploring different methods to reduce hallucinations in LLMs is crucial.

The study underlines the importance of continued research and development in this area to ensure the factual accuracy of AI-generated content.

The methodology used in this paper, focusing on active detection and mitigation of hallucinations, is a significant contribution to this field.

It sets a precedent for future research to build upon, encouraging the exploration of various approaches to enhance the reliability and effectiveness of these advanced AI systems.

Case Studies and Industry Insights

The Knowledge Graph-based Retrofitting (KGR) method is a notable approach to mitigating hallucinations in Large Language Models.

This method, proposed by Xinyan Guan, Yanjiang Liu, Hongyu Lin, Yaojie Lu, Ben He, Xianpei Han, and Le Sun, incorporates LLMs with Knowledge Graphs (KGs).

It effectively addresses factual hallucination during the reasoning process by retrofitting initial draft responses of LLMs based on factual knowledge stored in KGs. 

KGR leverages LLMs to autonomously extract, select, validate, and retrofit factual statements in model-generated responses, eliminating manual intervention. This method has significantly improved LLM performance on factual QA benchmarks, particularly in complex reasoning tasks.

The Knowledge Graph-based Retrofitting (KGR) method is a practical and successful approach to addressing hallucinations in Large Language Models (LLMs).

This technique and similar case studies showcase the potential of integrating LLMs with knowledge graphs to enhance accuracy and reliability in complex reasoning tasks.

KGR's autonomously refining LLM responses using factual data from knowledge graphs exemplifies a significant stride in mitigating hallucinations.

These advancements underline the importance and effectiveness of employing innovative strategies to ensure the factual integrity of LLM outputs.

Figure 5: Example for the claim verification and response retrofitting in KGR. The claim verification judges whether the claim aligns with searched triples and gives revision suggestions respectively. The response retrofitting incorporates the revision suggestions from all claims and gives a refined response. (Source)

**💡 Pro Tip: For insights into the challenges and solutions related to AI security in the context of LLMs, check out our article about AI security.**

Additional Resources

Here is a list of academic papers, technical resources, and real-world case studies on LLMs and AI safety for further reading:

  1. DelucionQA: Detecting Hallucinations in Domain-specific Question Answering
  2. Creating Trustworthy LLMs: Dealing with Hallucinations in Healthcare AI
  3. Knowledge Injection to Counter Large Language Model (LLM) Hallucination
  4. BERTScore: Evaluating Text Generation with BERT
  5. List of prior works on LLM hallucination, organized by evaluation, benchmark, enhancement, and survey - Reddit Thread 
  6. Enabling Large Language Models to Generate Text with Citations - Paper 
  7. TruthfulQA: Measuring How Models Mimic Human Falsehoods (Open AI, University of Oxford): https://arxiv.org/pdf/2109.07958.pdf
  8. Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data (Google): https://arxiv.org/pdf/2010.05873v1.pdf.

These resources provide different perspectives and insights into hallucinations in LLMs and how to mitigate them.

Key Takeaways

In the realm of Large Language Models, the phenomenon of generating plausible yet incorrect or nonsensical information, known as "hallucinations," poses a significant threat to the reliability and safety of these AI systems. This is especially concerning in areas where accuracy is paramount, such as healthcare or law.

Efforts to mitigate hallucinations are pivotal for maintaining the credibility and functionality of LLMs. Key methods for identifying and reducing these errors involve a combination of sophisticated metrics and critical human evaluations. These include:

  • Linguistic quality metrics like ROUGE and BLEU
  • Content validity metrics, which are IE-based, QA-based, and NLI-based
  • FActScore for checking the accuracy of individual facts

Looking to the future, the development of LLMs is steering towards greater robustness and safety, with a strong emphasis on grounding responses in verified information.

Innovative methods such as SelfCheckGPT detect hallucinations by assessing the consistency of multiple generated answers to the same query. Furthermore, techniques such as chain-of-thought prompting and Retrieval-Augmented Generation (RAG) are explored to fortify the models' ability to provide precise and relevant information.

Consistency and accuracy in responses are evaluated using tools such as BERTScore and Natural Language Inference.

Efforts are continuously made to improve artificial intelligence by advancing both the precision of detection systems and the quality of large language models. This shows a strong dedication in the AI community to develop technology that is advanced, reliable, and trustworthy.

**💡 Pro Tip: Understand the benefits and methodology of integrating external information into LLMs through Retrieval-Augmented Generation.**

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Unlock Free AI Security Guide.

Discover risks and solutions with the Lakera LLM Security Playbook.

Download Free

Explore Prompt Injection Attacks.

Learn LLM security, attack strategies, and protection tools. Includes bonus datasets.

Unlock Free Guide

Learn AI Security Basics.

Join our 10-lesson course on core concepts and issues in AI security.

Enroll Now

Evaluate LLM Security Solutions.

Use our checklist to evaluate and select the best LLM security tools for your enterprise.

Download Free

Uncover LLM Vulnerabilities.

Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.

Download Free

The CISO's Guide to AI Security

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Download Free

Explore AI Regulations.

Compare the EU AI Act and the White House’s AI Bill of Rights.

Download Free
Deval Shah

GenAI Security Preparedness
Report 2024

Get the first-of-its-kind report on how organizations are preparing for GenAI-specific threats.

Free Download
Read LLM Security Playbook

Learn about the most common LLM threats and how to prevent them.

Download

Explore AI Regulations.

Compare the EU AI Act and the White House’s AI Bill of Rights.

Understand AI Security Basics.

Get Lakera's AI Security Guide for an overview of threats and protection strategies.

Uncover LLM Vulnerabilities.

Explore real-world LLM exploits, case studies, and mitigation strategies with Lakera.

Optimize LLM Security Solutions.

Use our checklist to evaluate and select the best LLM security tools for your enterprise.

Master Prompt Injection Attacks.

Discover risks and solutions with the Lakera LLM Security Playbook.

Unlock Free AI Security Guide.

Discover risks and solutions with the Lakera LLM Security Playbook.

You might be interested

What is In-context Learning, and how does it work: The Beginner’s Guide

Learn everything you need to know about In-context learning. Explore how it works, what are the different approaches, benefits, challenges, and real-world applications.
Deval Shah
November 13, 2024

A Step-by-step Guide to Prompt Engineering: Best Practices, Challenges, and Examples

Explore the realm of prompt engineering and delve into essential techniques and tools for optimizing your prompts. Learn about various methods and techniques and gain insights into prompt engineering challenges.
Mikolaj Kowalczyk
November 13, 2024
Activate
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.