Learning Library

← Back to Library

Beyond Hallucinations: AI’s Credibility Overhang

Key Points

  • The speaker discusses how early high‑profile AI hallucinations created a credibility gap, leading many people to distrust models like ChatGPT, Claude, and Gemini despite their actual reliability.
  • A lower tolerance for errors is applied to AI outputs than to human work, even when AI dramatically speeds up tasks, which fuels the perception that AI must be “perfect.”
  • The practical value of AI emerges once its usefulness outweighs the effort needed to verify its answers, indicating that the technology has passed an “event horizon” from experimental to productive.
  • While hallucinations must still be mitigated—professionals such as lawyers and doctors need to double‑check AI‑generated information—the current level of AI competence already supports real‑world applications.
  • Ongoing discussions about AI hallucinations dominate public conversation, reflecting the lingering credibility overhang that the industry must address.

Full Transcript

# Beyond Hallucinations: AI’s Credibility Overhang **Source:** [https://www.youtube.com/watch?v=0IxUJJCBkPI](https://www.youtube.com/watch?v=0IxUJJCBkPI) **Duration:** 00:09:18 ## Summary - The speaker discusses how early high‑profile AI hallucinations created a credibility gap, leading many people to distrust models like ChatGPT, Claude, and Gemini despite their actual reliability. - A lower tolerance for errors is applied to AI outputs than to human work, even when AI dramatically speeds up tasks, which fuels the perception that AI must be “perfect.” - The practical value of AI emerges once its usefulness outweighs the effort needed to verify its answers, indicating that the technology has passed an “event horizon” from experimental to productive. - While hallucinations must still be mitigated—professionals such as lawyers and doctors need to double‑check AI‑generated information—the current level of AI competence already supports real‑world applications. - Ongoing discussions about AI hallucinations dominate public conversation, reflecting the lingering credibility overhang that the industry must address. ## Sections - [00:00:00](https://www.youtube.com/watch?v=0IxUJJCBkPI&t=0s) **AI Hallucinations and Credibility Gap** - The speaker discusses how early high‑profile AI hallucinations have created a lasting mistrust in language models, emphasizing the inflated credibility expectations for AI compared to human errors. - [00:03:40](https://www.youtube.com/watch?v=0IxUJJCBkPI&t=220s) **AI Hallucination Rates Vary By Task** - The speaker explains that hallucination frequencies can differ by up to tenfold depending on the prompt and task, and argues that careful, structured prompting and limiting impossible queries are simple best‑practice ways to keep AI-generated hallucinations low. - [00:07:09](https://www.youtube.com/watch?v=0IxUJJCBkPI&t=429s) **Human Stubbornness vs Safer AI** - The speaker argues that critics of AI are motivated by personal threat, asserts AI already outperforms humans in reliability, and attributes society’s reluctance to adopt safer technologies—such as autonomous vehicles—to innate human stubbornness. ## Full Transcript
0:00I did not want to do this. We are going 0:02to talk about hallucinations. And the 0:04reason we're going to talk about 0:05hallucinations is because I can't get 0:07people to stop talking to me about 0:09hallucinations. So, here we are. We're 0:10doing it. Uh, look, at the end of the 0:14day, the fact that chat GPT was released 0:17when it was at the capability level it 0:19was released at means that we have a 0:23massive overhang of credibility that we 0:25have to make up. Chat GPT is much more 0:30credible than people believe it is. 0:33Claude is much more credible than people 0:36believe it is. It's not really just 0:37about chat GPT, but for the people who 0:39care about this, it's always Chad GPT 0:41because that's the language model they 0:42know. But Gemini, same deal. 0:46What I'm trying to say is when chat GPT 0:48was released back in 0:502022, there were enough high-profile 0:53hallucinations that people misunderstood 0:57what AI can actually do and chocked it 0:59up to a bunch of lies. And I still hear 1:01that all the time. And I am just it's 1:03like every every day I hear this and I'm 1:06just dealing with it. I'm just going to 1:08talk about it. 1:10What I want to say is that we have a 1:13different bar for AI than we have for 1:17humans. For humans, if I had an 1:20unreliable researcher, frankly, a human 1:22researcher who was an intern, and that 1:26intern took a week to prepare me a 1:2840-page report, and if that intern made 1:30three mistakes in that 40-page report, I 1:34would 1:34say great. Uh, and I would love to use 1:38that report in whatever I'm working 1:41on. If an AI comes back in 30 minutes 1:45with a 40page report and it makes three 1:49mistakes, we say it's it's not good 1:52enough. It needs to be perfect. 1:54Why? It's already cut the time by a 1:57100x. Why does it need to be 2:00perfect? Why does it need to be more 2:02perfect than 2:04people? Now, there's other reasons to 2:06say that herist, you know, 2:07hallucinations are not that big a deal. 2:10Um, but I think that's the most 2:12compelling one to me because if you want 2:14AI to do useful work, then you just have 2:17to believe that the work it can do is 2:21more useful than the time it takes to 2:23check for hallucinations. And we are 2:25well past that bar. Does that mean that 2:28hallucinations don't matter? Does that 2:30mean a lawyer should not be checking 2:32their case by case citations if they're 2:35using AI? Does that mean a doctor 2:37shouldn't be checking the medical 2:38reasoning of an AI? Obviously 2:41not. Obviously, we should be checking 2:44and we should be working to reduce 2:46hallucinations. 2:48Great. But the fact that we are at a 2:51point now where it can clearly and 2:53obviously do useful work means that AI 2:56has crossed the event horizon. It is no 2:58longer just a play thing. is something 3:00we can do work with. And I think 3:02unfortunately that credibility overhang 3:05is biting this industry in the butt 3:07because at the end of the day, most 3:10people who are not sitting in this 3:12YouTube circle, if I talk to them about 3:15AI, hallucinations are the first thing 3:17out of their mouth. It's the first thing 3:19they talk about. Hey, what about 3:23hallucinations? I heard they make stuff 3:25up. I heard it 3:27lies. Honestly, it lies less than the 3:30average human does at this point. Most 3:33of them. The hallucination rate, which 3:35by the way, it's really hard to measure 3:37hallucination rate. I looked into this. 3:38I wrote a Substack about this if you 3:40want to check it out. If you don't, I 3:42don't care. It's a good read, though. 3:44Um, and it goes deep in on what 3:48hallucinations are. And one of the 3:49things that I think is really 3:50interesting is that what we call the 3:52hallucination rate varies by a factor of 3:5510 depending on the task you give it. 3:58The same model can come in at 1 and a.5% 4:00and 15%. And by the way, I'm not making 4:03that up. That's roughly where ChatGpt 4:054.5 comes in, depending on which 4:07hallucination measure you use. Context 4:10really matters. The kind of task you 4:12give it really matters. One of the 4:14reasons why I don't worry about 4:15hallucinations personally is because I 4:17don't give AI a situation where it is 4:20likely to make up hallucinations and 4:21then blame it. I figure that's 4:23mismanaging my employee. Like, why would 4:25I do that? I don't ask AI to do things 4:29that are virtually impossible unless it 4:32imagines or hallucinates or confabulates 4:35information because that's useless. Why 4:37would I do that? It's such a powerful 4:39tool for what it can do well. Why not 4:41specify your sources where you want it 4:43to go look? Why not be careful in my 4:45prompting and be really clear and 4:46structured? Because it does well when I 4:49do that. That's just easier for me. So, 4:52a lot of these things that actually 4:53reduce 4:54hallucinations, turns out that they're 4:57just best practice for working with AI. 5:00I don't know. Seems like we should 5:02follow best 5:04practice. And so, to me, like our open 5:07AI, our anthropic, are they working on 5:11this? Sure. Is Deep Mind at Google 5:12working on this? 5:14Absolutely. Does that mean that we're 5:16going to have 100% no hallucination 5:19models next year? I guarantee you we 5:21will not. And I also just about 5:24guarantee you it won't matter. It won't 5:26matter for real work. It's going to 5:28matter enormously for public perception 5:30because we are trained to assume that 5:34computers must be perfect because 5:37everything we've had in computers for 5:39100 years, well not 100 years, call it 5:4160 years, has been deterministic 5:44computing. 5:46It has been programs that if a plus b 5:49equals c then whatever right like it's 5:51all mathematics it's 5:54algorithmic everything is determined in 5:56the program when it runs and so we can 5:58expect 6:00perfection and all of our movies say the 6:03same thing. None of us are ready for an 6:06AI where we taught the rocks to think 6:08and they turn out to be poetic dreamers. 6:11We're just not ready for that. 6:14And the fact that the the like AI 6:17doesn't inherently have a factual world 6:19model. The fact that we can talk about a 6:221.5% error rate in certain hallucination 6:25tests for chat GPT 4.5 is a freaking 6:29miracle. I I am 6:31astonished. These things they dream. 6:34They come up with probabilistic tokens 6:37that they think match what you're 6:38looking for. They have no factual world 6:41model underneath. It's amazing they get 6:44anything right at all. It's kind of 6:46incredible. And so within that 6:49world, yeah, I do think we need to 6:51baseline on humans more. I do think we 6:54need to take seriously the fact that 6:56they do work. And I think that we need 6:57to come up with better answers as an 7:00industry for people who say all it does 7:04is lie. All it does is make stuff up. 7:07It's re and by the way the people who do 7:09that tend to be quite unreliable 7:11narrators themselves. I have never heard 7:14that kind of aggressive contrarian take 7:16from someone who isn't to some degree 7:19personally threatened by AI and needing 7:22to denigrate it. So there is absolutely 7:24a leading edge of change here. People 7:27who are worried about their jobs, people 7:29who are worried about what will happen 7:31to their work are going to be more 7:33likely to denigrate AI. And do I have a 7:35study for that? I will admit frankly I 7:37don't. That is based on me having 7:39conversations with hundreds of people. 7:41It's just something I've 7:43observed. So where does that leave us? 7:45At the end of the day 7:49AI is going to get to a point, in fact 7:52arguably is already crossing the line 7:55where it is more reliable in most fields 7:58than most 7:59humans. At which point we should stop 8:02worrying so much about hallucination for 8:04AI and logically worry about 8:05hallucination for ourselves. And we're 8:08not. And the reason why we're not is 8:10pretty simple. It's the same reason why 8:13Whimo vehicles are not more popular even 8:15though they're vastly safer. It's the 8:17same reason why we haven't outlawed 8:19human driving in the US even though 8:21statistically speaking in US testing, 8:23automated driving is already so much 8:25safer it costs lives to keep human 8:27drivers on the road. And I say the US 8:29because that's where it's been tested. 8:30It's probably true everywhere else in 8:32the world, too. We are a stubborn, 8:34stubborn race. We are a stubborn 8:37species. We do not easily give up on 8:41something we think is true. We think 8:42humans should drive. I do not see that 8:44disappearing anytime soon, even though 8:46that kills 8:47people. We think AI hallucinates. I 8:51don't think that belief is disappearing. 8:53even though it is demonstrabably easily 8:56obviously proved to be an unhelpful 8:59belief. But we have to try we have to 9:02try and explain to people what really 9:03matters here. We have to do our best to 9:05educate. And this is a challenge for all 9:07of us in the industry. And I just I got 9:09so tired of hearing about 9:11hallucinations. I just I wrote a giant 9:13Substack on it. I did this. Like, we've 9:16got to be able to