Learning Library

← Back to Library

AI Truth, Hallucination, and Agency Continuum

8m • Unknown Channel • ai-ml • deep-dive • advanced • Watch on YouTube ↗

Key Points

The rapid development of AI outpaces our ability to comprehend its behavior, creating risks from both over‑estimating and under‑estimating its capabilities.
AI outputs exist on a truth–hallucination spectrum that varies by model and context, debunking the myths that LLMs always lie or always tell the truth.
Reasoning and pattern‑matching in LLMs are not binary; models employ diverse mechanisms (e.g., Monte‑Carlo tree search, expert ensembles) that can simulate multi‑step thought when prompted cleverly.
A key emerging issue is the agency continuum: determining how autonomous LLMs are, whether they possess simulated goals, and how reinforcement‑learning environments shape their planning and alignment.
Understanding these nuanced spectrums is essential for responsibly assigning autonomy and ensuring alignment as AI systems become more sophisticated.

Sections

00:00:00 Untitled Section

Full Transcript

# AI Truth, Hallucination, and Agency Continuum **Source:** [https://www.youtube.com/watch?v=qKwfmWnjDgA](https://www.youtube.com/watch?v=qKwfmWnjDgA) **Duration:** 00:08:08 ## Summary - The rapid development of AI outpaces our ability to comprehend its behavior, creating risks from both over‑estimating and under‑estimating its capabilities. - AI outputs exist on a truth–hallucination spectrum that varies by model and context, debunking the myths that LLMs always lie or always tell the truth. - Reasoning and pattern‑matching in LLMs are not binary; models employ diverse mechanisms (e.g., Monte‑Carlo tree search, expert ensembles) that can simulate multi‑step thought when prompted cleverly. - A key emerging issue is the agency continuum: determining how autonomous LLMs are, whether they possess simulated goals, and how reinforcement‑learning environments shape their planning and alignment. - Understanding these nuanced spectrums is essential for responsibly assigning autonomy and ensuring alignment as AI systems become more sophisticated. ## Sections - [00:00:00](https://www.youtube.com/watch?v=qKwfmWnjDgA&t=0s) **Untitled Section** - ## Full Transcript

0:00so I think this kind of needs the Star 0:03Trek next Generation Vibe because we're 0:05going to be talking about some really 0:07abstract stuff it just feels like we 0:08should be soaring through space so so 0:10put on that hat fundamentally what I 0:13want to talk about today is the idea 0:16that we are developing AI systems so 0:19quickly that we're having trouble 0:21understanding what they do and it's 0:23probably important to name the risks of 0:28misunderstanding as a continu 0:30and that sounds really abstract so let 0:33me make it more specific here's the 0:35simplest example truth or 0:37hallucination you have people who think 0:39that AI is lying all the time you have 0:41people who think it's the next thing to 0:42God and it's just telling the truth 0:44truth is somewhere in the middle 0:46actually uh it doesn't always lie it 0:49also doesn't always tell the truth it's 0:50very context dependent it varies by 0:53Model A lot uh and it varies in 0:56surprising ways I still remember seeing 0:58the headline that deep seek are one the 1:00supposedly thinking model ends up 1:02hallucinating more than the non-thinking 1:04version of Deep seek V3 that was super 1:07interesting you got to test these things 1:09to find out what's really going 1:11on let me give you a few more areas 1:14where I think spectrums are a useful 1:16framing for understanding AI 1:18capabilities 1:20reasoning it's not just binary it's not 1:22just it pattern matches or it thinks 1:25deeply there's different ways that 1:28tokens think uh are produced during 1:30inference so it could be a 1:32multi-threaded uh sort of Monte Carlo 1:34treesearch it could be a model of 1:37experts like there's different ways of 1:39doing that um and similarly on the 1:41pattern matching it's not always as 1:43clear how it predicts the next token as 1:45you might think and different models 1:47sometimes have slightly different 1:49approaches pick on deep seek again 1:51they're doing next two token prediction 1:53as an example uh so there's some sort of 1:56nuances 1:57there and I think that the the thing 2:00that makes it even more complicated is 2:02if you prompt a pattern matching model 2:04correctly you can simulate reliably 2:06multi-step 2:08thought it becomes blurry and it leads 2:11to people having misconceptions about 2:12model capabilities it's part of what 2:14makes models so hard to 2:17understand one of the things that I 2:20think is going to be highly relevant 2:22over the next year if we talk about a 2:23Continuum is the Continuum of 2:26agency are agents genuinely autonomous 2:31do they have simulated goals on the back 2:33end to kind of keep them going uh to 2:36what degree to they do they plan to what 2:39degree is their planning shaped by their 2:42reinforcement learning environment 2:43there's some evidence for example that 2:45llms are changing their responses in 2:48reinforcement learning environments 2:50because they can tell they're being 2:51tested if that's the case what is that 2:54mean for assessing 2:56alignment what does that mean for 2:58understanding whether an agent is 3:00responsible enough to be granted a 3:02particular scope of 3:05autonomy it seems to me like the way 3:08through on this sort of Continuum 3:10conversation is unfortunately 3:13fortunately if you're a nerd it's in the 3:16specifics you have to understand this is 3:18what grock 3 can do this is what deep 3:21seek can do this is what llama can do 3:24Etc and I understand that model makers 3:28have a big leg up on their own cap 3:29capabilities but no one has perfect 3:32knowledge of all of these models and I 3:35think part of what we need in the 3:37community is a willingness to test 3:40really carefully across multiple 3:43models and evaluate whether or not those 3:47models actually deliver value like I get 3:50surprised every time I run a test 3:51between models I try and run one about 3:53once a week on the substack the last one 3:55I ran was on image generation I was 3:58frankly surprised at the the difference 4:01in performance between chat GPT 40's New 4:04Image generation that uses Auto 4:06regressive scaling and uh Gemini's image 4:10generation and 4:12I I was surprised because I could get it 4:14sort of granular understanding of what 4:16was going on and kind of where 4:18performance was with not that many 4:21samples like I I don't know I very 4:23structured very carefully crafted I did 4:25eight or nine 4:27proms um and what I saw was in enough of 4:30a bias that I felt good as a basian 4:35saying you know what I should probably 4:36update my priers here in particular I 4:40saw better prompt adherence from chat 4:42GPT and that seemed to align with better 4:46image quality net net um and that makes 4:49sense because now we're starting to see 4:50like Cy prompts and stuff getting leaked 4:52from chat GPT basically saying that on 4:54the back end uh chat GPT 40 is expanding 4:58whatever the user's utterance is when it 5:00asks for an image and that's helping it 5:01prompt more effectively so linguistic 5:03Fidelity makes a ton of sense 5:05Etc I present all of that not because 5:08may maybe you don't care about 40 it's a 5:10big deal you should I think it Sam was 5:12saying it drove a million uh signups in 5:14an hour today March 31st was kind of a 5:17big deal uh but even if you don't care 5:19about 5:20that the the point is you should test 5:23the point is you should think through 5:25and test specifically what language 5:28models actually are capable of what 5:30Transformer based architectures in the 5:32case of the image models are actually 5:33capable of the devil really is in the 5:36detail um and at the end of the day I 5:40don't see model makers caring enough to 5:42actually build detailed Work Ready 5:46evaluations and I think if they're not 5:49the least we can do is agree on 5:54a set of 5:56continuums that we can place models 5:59against and so some of the ones I've 6:00suggested here 6:02around uh hallucination versus truth 6:05around pattern matching versus 6:07multi-step thought around understanding 6:11agency and sort of to what degree is 6:13there genuine autonomy versus simulated 6:15goals maybe I throw another one in there 6:18around uh computational efficiency 6:20versus 6:21performance um maybe one around 6:24robustness and 6:25consistency uh where they're act you 6:28know either able to maintain robustness 6:32against adversarial inputs ambiguous 6:34prompts variations and context or 6:36they're not or they're less so right I'm 6:39throwing those out there because I think 6:40that having a common language for these 6:43kind of continuums is super 6:46meaningful because it helps us to know 6:50what we're talking about know where we 6:52place models know why and be able to 6:55reliably Benchmark them and if we just 6:58depend on model benchmarking unlike the 7:01aim or some other sort of mathematical 7:05evaluation we're losing a lot of value 7:07those those are overfitted at this point 7:10from a modeling perspective uh and we 7:12need a wider conversation around what 7:15models are good for and how we use them 7:18to do useful work and how we measure 7:20their performance and so probably put 7:22this on the substack but I think there's 7:24a conversation to be had around 7:27understanding model capability in terms 7:29of of continuums and that could be a 7:31fairly durable way of sort of digging in 7:35and understanding in more depth a 7:39particular model's placement across a 7:41range of useful working actions that's 7:46not the same thing as that scored like 7:48150 million points on the aim make up 7:51your acronym here which is the the thing 7:54that my brain says whenever I see these 7:56test scores because we've just seen them 7:57so many times the test scor are useless 8:00right we need something else so just 8:03kind of exploring the idea of a 8:04Continuum let me know what you think 8:06cheers