Learning Library

← Back to Library

LLM Hallucinations Explained

Key Points

  • The speaker presents three fabricated “facts” (distance to the Moon, airline work history, and a Webb telescope claim) to illustrate how large language models can hallucinate plausible‑sounding but false information.
  • Hallucinations are defined as LLM outputs that deviate from factual or contextual truth, ranging from minor inconsistencies to completely invented statements.
  • Different categories of hallucinations are outlined, including sentence contradictions, prompt contradictions, factual errors, and nonsensical or irrelevant insertions.
  • The underlying causes of hallucinations are complex and largely opaque, stemming from the black‑box nature of how LLMs generate text.
  • The talk promises to explore strategies for reducing hallucinations when using LLMs like ChatGPT or Bing Chat.

Full Transcript

# LLM Hallucinations Explained **Source:** [https://www.youtube.com/watch?v=cfqtFvWOfg0](https://www.youtube.com/watch?v=cfqtFvWOfg0) **Duration:** 00:09:35 ## Summary - The speaker presents three fabricated “facts” (distance to the Moon, airline work history, and a Webb telescope claim) to illustrate how large language models can hallucinate plausible‑sounding but false information. - Hallucinations are defined as LLM outputs that deviate from factual or contextual truth, ranging from minor inconsistencies to completely invented statements. - Different categories of hallucinations are outlined, including sentence contradictions, prompt contradictions, factual errors, and nonsensical or irrelevant insertions. - The underlying causes of hallucinations are complex and largely opaque, stemming from the black‑box nature of how LLMs generate text. - The talk promises to explore strategies for reducing hallucinations when using LLMs like ChatGPT or Bing Chat. ## Sections - [00:00:00](https://www.youtube.com/watch?v=cfqtFvWOfg0&t=0s) **LLM Hallucinations Illustrated with Aviation** - The speaker presents three inaccurate space‑related “facts” to demonstrate how large language models generate plausible‑sounding falsehoods, then explains what hallucinations are, why they occur, and how to mitigate them. - [00:03:10](https://www.youtube.com/watch?v=cfqtFvWOfg0&t=190s) **Causes of LLM Hallucinations** - The speaker outlines how irrelevant or noisy training data, incomplete domain coverage, and the underlying text generation methods (e.g., beam search, sampling) lead large language models to produce factual errors and unrelated content. - [00:06:41](https://www.youtube.com/watch?v=cfqtFvWOfg0&t=401s) **Mitigating LLM Hallucinations** - It outlines practical techniques—using clear, specific prompts, adjusting generation temperature, and employing multi‑shot prompting—to reduce hallucinations in conversational interactions with large language models. ## Full Transcript
0:00I'm going to state three facts. 0:02Your challenge is to tell me how they're related; they're all space in aviation theme, but that's not it. 0:07So here we go! Number one-- the distance from the Earth to the Moon is 54 million kilometers. 0:13Number two-- before I worked at IBM, I worked at a major Australian airline. 0:18And number three-- the James Webb Telescope took the very first pictures of an exoplanet outside of our solar system. 0:25What's the common thread? 0:27Well, the answer is that all three "facts" are an example of an hallucination of a large language model, otherwise known as an LLM. 0:45Things like chatGPT and Bing chat. 0:4954 million K, that's the distance to Mars, not the moon. 0:53It's my brother that works at the airline, not me. 0:55And infamously, at the announcement of Google's LLM, Bard, it hallucinated about the Webb telescope. 1:02The first picture of an exoplanet it was actually taken in 2004. 1:06Now, while large language models can generate fluent and coherent text on various topics and domains, 1:13they are also prone to just "make stuff up". Plausible sounding nonsense! So let's discuss, first of all, what a hallucination is. 1:29We'll discuss why they happen. 1:33And we'll take some steps to describe how you can minimize hallucinations with LLMs. 1:43Now hallucinations are outputs of LLMs that deviate from facts or contextual logic, 1:49and they can range from minor inconsistencies to completely fabricated or contradictory statements. 1:55And we can categorize hallucinations across different levels of granularity. 2:00Now, at the lowest level of granularity we could consider sentence contradiction. 2:11This is really the simplest type, and this is where an LLM generates a sentence that contradicts one of the previous sentences. 2:18So "the sky is blue today." 2:21"The sky is green today." Another example would be prompt contradiction. 2:31And this is where the generated sentence contradicts with the prompt that was used to generate it. 2:38So if I ask an LLM to write a positive review of a restaurant and its returns, "the food was terrible and the service was rude," 2:46ah, that would be in direct contradiction to what I asked. 2:51Now, we already gave some examples of another type here, which is a factual contradictions. 2:58And these factual contradictions, or factual error hallucinations, are really just that-- absolutely nailed on facts that they got wrong. 3:06Barack Obama was the first president of the United States-- something like that. 3:11And then there are also nonsensical or otherwise irrelevant kind of information based hallucinations 3:21where it just puts in something that really has no place being there. Like "The capital of France is Paris." 3:27"Paris is also the name of a famous singer." Okay, umm, thanks? 3:32Now with the question of what LLMs hallucinations are answered, we really need to answer the question of why. 3:41And it's not an easy one to answer, 3:43because the way that they derive their output is something of a black box, even to the engineers of the LLM itself. 3:51But there are a number of common causes. 3:54So let's take a look at a few of those. 3:57One of those is a data quality. 4:02Now LLMs are trained on a large corpora of text that may contain noise, errors, biases or inconsistencies. 4:09For example, some LLMs were trained by scraping all of Wikipedia and all of Reddit. 4:15It is everything on Reddit 100% accurate? 4:18Well, look, even if it was even if the training data was entirely reliable, 4:23that data may not cover all of the possible topics or domains the LLMs are expected to generate content about. 4:30So LLMs may generalize from data without being able to verify its accuracy or relevance. 4:37And sometimes it just gets it wrong. 4:40As LLM reasoning capabilities improve, hallucinations tend to decline. 4:47Now, another reason why hallucinations can happen is based upon the generation method. 4:56Now, LLMs use various methods and objectives to generate text such as beam search, 5:01sampling, maximum likelihood estimation, or reinforcement learning. And these methods and these objectives may introduce biases 5:10and tradeoffs between things like fluency and diversity, between coherence and creativity, or between accuracy and novelty. 5:18So, for example, beam search may favor high probability, but generic words over low probability, but specific words. 5:29And another common cause for hallucinations is input context. 5:33And this is one we can do something directly about as users. 5:39Now, here, context refers to the information that is given to the model as an input prompt. 5:44Context can help guide the model to produce the relevant and accurate outputs, 5:49but it can also confuse or mislead the model if it's unclear or if it's inconsistent or if it's contradictory. 5:55So, for example, if I ask an LLM chat bot, "Can cats speak English?" 6:01I would expect the answer "No, and do you need to sit down for a moment?". 6:07But perhaps I just forgotten to include a crucial little bit of information, a bit of context that this conversation thread 6:15is talking about the Garfield cartoon strip, in which case the LLM should have answered, 6:21"Yes, cats can speak English and that cat is probably going to ask for second helpings of lasagna." 6:28Context is important, and if we don't tell it we're looking for generated text suitable for an academic essay or a creative writing exercise, 6:37we can't expect it to respond within that context. 6:41Which brings us nicely to the third and final part-- what can we do to reduce hallucinations in our own conversations with LLMs? 6:50So, yep, one thing we can certainly do is provide clear and specific prompts to the system. 7:01Now, the more precise and the more detailed the input prompt, 7:04the more likely the LLM will generate relevant and, most importantly, accurate outputs. 7:11So, for example, instead of asking "What happened in World War Two?" That's not very clear. 7:16It's not very specific. 7:17We could say, "Can you summarize the major events of World War Two, 7:21including the key countries involved in the primary causes of the conflict?" 7:24Something like that that really gets at what we are trying to pull from this. 7:29That gives the model a better understanding of what information is expected in the response. 7:35We can employ something called active mitigation strategies. 7:43And what these are are using some of the settings of the LLMs, 7:46such as settings that control the parameters of how the LLM works during generation. 7:52A good example of that is the temperature parameter, which can control the randomness of the output. 7:57So a lower temperature will produce more conservative and focused responses, 8:02while a higher temperature will generate more diverse and creative ones. 8:06But the higher the temperature, the more opportunity for hallucination. 8:12And then one more is multi-shot prompting. 8:20And in contrast to single shot prompting where we only gave one prompt, 8:25multi-shot prompting provides the LLM with multiple examples of the desired output format or context, 8:31and that essentially primes the model, giving a clearer understanding of the user's expectations. 8:38By presenting the LLM with several examples, we help it recognize the pattern or the context more effectively, 8:45and this can be particularly useful in tasks that require a specific output format. 8:50So, generating code, writing poetry or answering questions in a specific style. 8:56So while large language models may sometimes hallucinate and take us on an unexpected journey, 54 million kilometers off target, 9:06understanding the causes and employing the strategies to minimize those causes 9:13really allows us to harness the true potential of these models and reduce hallucinations. 9:20Although I did kind of enjoy reading about my fictional career down under. 9:26If you have any questions, please drop us a line below. 9:29And if you want to see more videos like this in the future, please like and subscribe. 9:34Thanks for watching.