Learning Library

← Back to Library

AI Truth, Hallucination, and Agency Continuum

Key Points

  • The rapid development of AI outpaces our ability to comprehend its behavior, creating risks from both over‑estimating and under‑estimating its capabilities.
  • AI outputs exist on a truth–hallucination spectrum that varies by model and context, debunking the myths that LLMs always lie or always tell the truth.
  • Reasoning and pattern‑matching in LLMs are not binary; models employ diverse mechanisms (e.g., Monte‑Carlo tree search, expert ensembles) that can simulate multi‑step thought when prompted cleverly.
  • A key emerging issue is the agency continuum: determining how autonomous LLMs are, whether they possess simulated goals, and how reinforcement‑learning environments shape their planning and alignment.
  • Understanding these nuanced spectrums is essential for responsibly assigning autonomy and ensuring alignment as AI systems become more sophisticated.

Full Transcript

# AI Truth, Hallucination, and Agency Continuum **Source:** [https://www.youtube.com/watch?v=qKwfmWnjDgA](https://www.youtube.com/watch?v=qKwfmWnjDgA) **Duration:** 00:08:08 ## Summary - The rapid development of AI outpaces our ability to comprehend its behavior, creating risks from both over‑estimating and under‑estimating its capabilities. - AI outputs exist on a truth–hallucination spectrum that varies by model and context, debunking the myths that LLMs always lie or always tell the truth. - Reasoning and pattern‑matching in LLMs are not binary; models employ diverse mechanisms (e.g., Monte‑Carlo tree search, expert ensembles) that can simulate multi‑step thought when prompted cleverly. - A key emerging issue is the agency continuum: determining how autonomous LLMs are, whether they possess simulated goals, and how reinforcement‑learning environments shape their planning and alignment. - Understanding these nuanced spectrums is essential for responsibly assigning autonomy and ensuring alignment as AI systems become more sophisticated. ## Sections - [00:00:00](https://www.youtube.com/watch?v=qKwfmWnjDgA&t=0s) **Untitled Section** - ## Full Transcript
0:00so I think this kind of needs the Star 0:03Trek next Generation Vibe because we're 0:05going to be talking about some really 0:07abstract stuff it just feels like we 0:08should be soaring through space so so 0:10put on that hat fundamentally what I 0:13want to talk about today is the idea 0:16that we are developing AI systems so 0:19quickly that we're having trouble 0:21understanding what they do and it's 0:23probably important to name the risks of 0:28misunderstanding as a continu 0:30and that sounds really abstract so let 0:33me make it more specific here's the 0:35simplest example truth or 0:37hallucination you have people who think 0:39that AI is lying all the time you have 0:41people who think it's the next thing to 0:42God and it's just telling the truth 0:44truth is somewhere in the middle 0:46actually uh it doesn't always lie it 0:49also doesn't always tell the truth it's 0:50very context dependent it varies by 0:53Model A lot uh and it varies in 0:56surprising ways I still remember seeing 0:58the headline that deep seek are one the 1:00supposedly thinking model ends up 1:02hallucinating more than the non-thinking 1:04version of Deep seek V3 that was super 1:07interesting you got to test these things 1:09to find out what's really going 1:11on let me give you a few more areas 1:14where I think spectrums are a useful 1:16framing for understanding AI 1:18capabilities 1:20reasoning it's not just binary it's not 1:22just it pattern matches or it thinks 1:25deeply there's different ways that 1:28tokens think uh are produced during 1:30inference so it could be a 1:32multi-threaded uh sort of Monte Carlo 1:34treesearch it could be a model of 1:37experts like there's different ways of 1:39doing that um and similarly on the 1:41pattern matching it's not always as 1:43clear how it predicts the next token as 1:45you might think and different models 1:47sometimes have slightly different 1:49approaches pick on deep seek again 1:51they're doing next two token prediction 1:53as an example uh so there's some sort of 1:56nuances 1:57there and I think that the the thing 2:00that makes it even more complicated is 2:02if you prompt a pattern matching model 2:04correctly you can simulate reliably 2:06multi-step 2:08thought it becomes blurry and it leads 2:11to people having misconceptions about 2:12model capabilities it's part of what 2:14makes models so hard to 2:17understand one of the things that I 2:20think is going to be highly relevant 2:22over the next year if we talk about a 2:23Continuum is the Continuum of 2:26agency are agents genuinely autonomous 2:31do they have simulated goals on the back 2:33end to kind of keep them going uh to 2:36what degree to they do they plan to what 2:39degree is their planning shaped by their 2:42reinforcement learning environment 2:43there's some evidence for example that 2:45llms are changing their responses in 2:48reinforcement learning environments 2:50because they can tell they're being 2:51tested if that's the case what is that 2:54mean for assessing 2:56alignment what does that mean for 2:58understanding whether an agent is 3:00responsible enough to be granted a 3:02particular scope of 3:05autonomy it seems to me like the way 3:08through on this sort of Continuum 3:10conversation is unfortunately 3:13fortunately if you're a nerd it's in the 3:16specifics you have to understand this is 3:18what grock 3 can do this is what deep 3:21seek can do this is what llama can do 3:24Etc and I understand that model makers 3:28have a big leg up on their own cap 3:29capabilities but no one has perfect 3:32knowledge of all of these models and I 3:35think part of what we need in the 3:37community is a willingness to test 3:40really carefully across multiple 3:43models and evaluate whether or not those 3:47models actually deliver value like I get 3:50surprised every time I run a test 3:51between models I try and run one about 3:53once a week on the substack the last one 3:55I ran was on image generation I was 3:58frankly surprised at the the difference 4:01in performance between chat GPT 40's New 4:04Image generation that uses Auto 4:06regressive scaling and uh Gemini's image 4:10generation and 4:12I I was surprised because I could get it 4:14sort of granular understanding of what 4:16was going on and kind of where 4:18performance was with not that many 4:21samples like I I don't know I very 4:23structured very carefully crafted I did 4:25eight or nine 4:27proms um and what I saw was in enough of 4:30a bias that I felt good as a basian 4:35saying you know what I should probably 4:36update my priers here in particular I 4:40saw better prompt adherence from chat 4:42GPT and that seemed to align with better 4:46image quality net net um and that makes 4:49sense because now we're starting to see 4:50like Cy prompts and stuff getting leaked 4:52from chat GPT basically saying that on 4:54the back end uh chat GPT 40 is expanding 4:58whatever the user's utterance is when it 5:00asks for an image and that's helping it 5:01prompt more effectively so linguistic 5:03Fidelity makes a ton of sense 5:05Etc I present all of that not because 5:08may maybe you don't care about 40 it's a 5:10big deal you should I think it Sam was 5:12saying it drove a million uh signups in 5:14an hour today March 31st was kind of a 5:17big deal uh but even if you don't care 5:19about 5:20that the the point is you should test 5:23the point is you should think through 5:25and test specifically what language 5:28models actually are capable of what 5:30Transformer based architectures in the 5:32case of the image models are actually 5:33capable of the devil really is in the 5:36detail um and at the end of the day I 5:40don't see model makers caring enough to 5:42actually build detailed Work Ready 5:46evaluations and I think if they're not 5:49the least we can do is agree on 5:54a set of 5:56continuums that we can place models 5:59against and so some of the ones I've 6:00suggested here 6:02around uh hallucination versus truth 6:05around pattern matching versus 6:07multi-step thought around understanding 6:11agency and sort of to what degree is 6:13there genuine autonomy versus simulated 6:15goals maybe I throw another one in there 6:18around uh computational efficiency 6:20versus 6:21performance um maybe one around 6:24robustness and 6:25consistency uh where they're act you 6:28know either able to maintain robustness 6:32against adversarial inputs ambiguous 6:34prompts variations and context or 6:36they're not or they're less so right I'm 6:39throwing those out there because I think 6:40that having a common language for these 6:43kind of continuums is super 6:46meaningful because it helps us to know 6:50what we're talking about know where we 6:52place models know why and be able to 6:55reliably Benchmark them and if we just 6:58depend on model benchmarking unlike the 7:01aim or some other sort of mathematical 7:05evaluation we're losing a lot of value 7:07those those are overfitted at this point 7:10from a modeling perspective uh and we 7:12need a wider conversation around what 7:15models are good for and how we use them 7:18to do useful work and how we measure 7:20their performance and so probably put 7:22this on the substack but I think there's 7:24a conversation to be had around 7:27understanding model capability in terms 7:29of of continuums and that could be a 7:31fairly durable way of sort of digging in 7:35and understanding in more depth a 7:39particular model's placement across a 7:41range of useful working actions that's 7:46not the same thing as that scored like 7:48150 million points on the aim make up 7:51your acronym here which is the the thing 7:54that my brain says whenever I see these 7:56test scores because we've just seen them 7:57so many times the test scor are useless 8:00right we need something else so just 8:03kind of exploring the idea of a 8:04Continuum let me know what you think 8:06cheers