Learning Library

← Back to Library

Six Common AI Mistakes Explained

Key Points

  • The speaker’s “AI office hours” with Fortune 500 teams repeatedly reveal six common mistakes, and the video will walk through each one with remediation advice.
  • **Projection trap:** users assume the model can infer unstated details (e.g., audience, length), leading to wrong answers; the fix is “schema‑first prompting” – explicitly define the desired output format instead of relying on vague prompts.
  • **Revision/regeneration loop:** when asked for a small tweak, the model often rewrites large sections, a problem especially acute for code but affecting all content creators; mitigating it requires tighter constraints and clear instructions about which parts may be altered.
  • The speaker emphasizes that these strategies are applied when the model “breaks,” and will provide concrete prompt examples in an accompanying Substack post.
  • Overall, precise specification of output schemas and narrowly scoped edit requests are the core tactics to avoid the most frequent AI misuse pitfalls.

Full Transcript

# Six Common AI Mistakes Explained **Source:** [https://www.youtube.com/watch?v=KwQpPbLEBMA](https://www.youtube.com/watch?v=KwQpPbLEBMA) **Duration:** 00:11:24 ## Summary - The speaker’s “AI office hours” with Fortune 500 teams repeatedly reveal six common mistakes, and the video will walk through each one with remediation advice. - **Projection trap:** users assume the model can infer unstated details (e.g., audience, length), leading to wrong answers; the fix is “schema‑first prompting” – explicitly define the desired output format instead of relying on vague prompts. - **Revision/regeneration loop:** when asked for a small tweak, the model often rewrites large sections, a problem especially acute for code but affecting all content creators; mitigating it requires tighter constraints and clear instructions about which parts may be altered. - The speaker emphasizes that these strategies are applied when the model “breaks,” and will provide concrete prompt examples in an accompanying Substack post. - Overall, precise specification of output schemas and narrowly scoped edit requests are the core tactics to avoid the most frequent AI misuse pitfalls. ## Sections - [00:00:00](https://www.youtube.com/watch?v=KwQpPbLEBMA&t=0s) **The Projection Trap in AI** - The speaker reveals that users often over‑attribute capabilities to AI models—by assuming hidden competencies or leaving prompts underspecified—and advises fixing this by explicitly defining the desired output schema rather than relying on vague, lengthy prompts. - [00:03:31](https://www.youtube.com/watch?v=KwQpPbLEBMA&t=211s) **Schema Controls and Planning Illusion** - The speaker explains how to use a schema to freeze most fields and target specific ones for correction, then highlights the “planning illusion” where complex prompts collapse into shallow one‑shot responses, urging better task planning instead of blaming model quality. - [00:06:57](https://www.youtube.com/watch?v=KwQpPbLEBMA&t=417s) **Prompting Strategies to Mitigate Hallucinations** - The speaker advises adding “I don’t know” responses, confidence labels, verification fields, and unambiguous condition prompts, plus lowering model temperature and applying strict constraints, to reduce hallucinations and output drift. - [00:10:52](https://www.youtube.com/watch?v=KwQpPbLEBMA&t=652s) **Fixing Recurring AI Problems** - The speaker acknowledges frequent challenges developers face with AI models, offers practical solutions, and directs listeners to a detailed Substack guide with prompts and examples. ## Full Transcript
0:00I spent the last few months running 0:01informal office hours with Fortune 500 0:04product teams, with technical writers, 0:06with consultants. I keep coming up 0:08against the same six problems when 0:11people use AI. So, I'm calling this 0:13video chat GPT office hours. We're going 0:16to go through each of the six problems 0:18that I come across over and over again, 0:20and I'm going to call out how I advise 0:22teams to fix it. Number one, the 0:25projection trap. What's that? you are 0:27projecting capabilities onto the model 0:30that the model does not have. I see this 0:34with developers and with non-developers 0:36both. If you're a developer, this can 0:37look like assuming a competency in an 0:40agent tool call that isn't really there. 0:42If you are not a developer, it can look 0:45like writing a prompt and the prompt is 0:48underspecified. The prompt lets you 0:50infer and guess. And so for example, if 0:52you say in a casual prompt, write me a 0:55professional update about the migration. 0:57The model might assume you mean an 0:59engineering audience and that you want a 1:00deep technical status update. But you 1:02may have meant, I want an executive 1:03audience in 150 words. That's a very 1:06simple example. And people assume that 1:08they can fix this by just writing very 1:10long prompts. That's not correct. The 1:13correct way to fix this is actually to 1:15specify the schema of the output. And so 1:19what you want to do is instead of saying 1:21I need to compose a prompt and writing 1:23prompt forward in your head, flip it 1:25around and say I want an output that 1:27looks like this. And that becomes your 1:29map that you give to the AI. And this is 1:32called schema first prompting. It is a 1:34way to handle situations where you are 1:37routinely getting incorrect responses. 1:40It's not that you have to do this every 1:41single time. All of these prompts and 1:44prompt tips and office hours approaches 1:46are things that you do when things 1:47break. That is why we're having this 1:49conversation. So when you are getting 1:51responses that do not work, ask yourself 1:54if you are projecting capabilities onto 1:56the model that it does not have. And if 1:58you are, think about whether or not you 2:02need to specify your output more 2:05precisely in the prompt. All of these 2:08are going to have examples in the 2:09Substack write up. I will have actual 2:11printed out prompts that you can look 2:12at. I want to get through all six of 2:14them with you here. So that's the 2:15principle. Number two, the revision loop 2:18or the regeneration loop. You ask for a 2:20tiny fix, the model rewrites the whole 2:23thing or touches a bunch of things you 2:24didn't want it to touch. This happens a 2:27ton. Models have a lot of difficulty 2:29touching just one thing. One of the big 2:31updates that Codeex made in the last 2:33couple of weeks or months that helps 2:35with this for coders, for developers, is 2:38that they have better inference around 2:40surgical code changes. But we're not all 2:42developers. This is not all about code. 2:44And to me, from an office hours 2:47perspective, this is a much larger issue 2:48that affects everyone who uses AI. We 2:51get into problems really, really quickly 2:54when we do not specify what we want 2:57changed in a regeneration loop. And so 2:59when you think about regenerating a 3:01model output, when we call it iterating, 3:04let's say you're in the same 3:05conversational thread or let's say 3:07you're going back in an agentic workflow 3:09and you're reworking based on an 3:11incorrect response, you need to be 3:14absolutely surgical about what you want 3:18changed. Ideally, you want to quote the 3:20exact snippet you want changed. You want 3:22to say, "This is what's wrong," and 3:24request only that patched section back. 3:27As an advanced fix, if you are keeping a 3:31schema, which I talked about in office 3:33hours principle number one, you can 3:36actually say, I only want this field 3:38name in your schema to be touched. 3:40Freeze all the other fields and that can 3:43help as well. Now you get the ability to 3:45actually control and drive schema level 3:48outputs in the API if you're a builder. 3:50But I find in practice the models are so 3:54trained for developers that even if 3:56you're not a developer if you give it a 3:59non-technical schema like this is the 4:01title, this is the author, it tends to 4:03respect those because it's trained to 4:06respect labeling and schemas. And so if 4:08you want to do a more advanced fix, you 4:10can label your outputs with a schema and 4:12then say this section of the schema is 4:14incorrect. You need to fix it. That's 4:16number two. Number three. Number three 4:19is the planning illusion. So complex 4:21tasks will often collapse themselves 4:24into one shot where the model will skip 4:26crucial steps in analysis or in 4:28planning. Like if you say, "Please 4:30analyze the churn and propose a plan." 4:32And the model just does a single blob 4:34pass on it. It has very shallow causes. 4:37It's a weak plan. This is something we 4:39don't discuss and we often attribute it 4:42to model quality where we say the model 4:44is bad because I got a bad response on 4:46this multi-step reasoning challenge. I 4:49want to suggest to you here in office 4:50hours that the problem is probably not 4:53the model at this point. It is probably 4:55the way you are handling the planning 4:57step. A better approach, a baseline 5:00approach to fix this is to break the 5:03challenge into stages with explicit 5:05outputs and validation gates at each 5:07stage like do stage one only where you 5:10review all of the incoming data in this 5:12way along these axes and then report 5:13back with this output stop and then 5:15force a step-by-step progression. That's 5:17a very simple thing. You can enforce 5:18that with tool calls agentically in the 5:21API. You can also do it in the chat. You 5:23notice how I'm deliberately saying we 5:26face the same problems as developers and 5:28non-developers. That is one of my core 5:30beliefs about AI. We are not that 5:32different. We're all using the same 5:33models. The advanced fix here is you 5:36want to start to specify tools and 5:39contracts for tools. That gets very 5:41technical in the API. You can do it in 5:44the chat in plain English. You can say I 5:47want you to use the internet search tool 5:49for this or I want to constrain your 5:51internet search to only academic sites. 5:54You can do other things as well, but 5:55that's that's a simple way of describing 5:56what a tool call in a contract would be. 5:58You're basically saying in this case to 6:00do this reasoning step, you will need 6:02these inputs and these tools. Please use 6:04Python or please analyze this with your 6:07uh PowerPoint creation scale and create 6:09the PowerPoint whatever it is. The point 6:11is by defining the tools and what you 6:14want to call you are going to leave the 6:17model less room to walk off and do a 6:19single blob weak pass reasoning effort 6:22and instead you're going to specify 6:24enough of the effort along the way and 6:26the sequence along the way that you'll 6:28get a strong reasoning takeover. This is 6:31different from saying think hard. This 6:33is different from saying the models 6:34don't reason. The models do reason, but 6:37if you care about how they do so and the 6:40reasoning quality, you need to invest in 6:42this so you don't get stuck in the 6:43planning illusion. And that's number 6:44three. Number four is the confidence 6:47illusion. We also call that 6:49hallucination. You get fluent answers, 6:51mismatched, non-existent citations. 6:54Look, the simplest baseline fix is to 6:57permit the model to say, "I don't know." 7:00and to require confidence labels where 7:02the model needs to name its level of 7:04confidence. You can also ask for a 7:06claims to verify list. If you want to go 7:09farther, you can have a verification 7:12fields array in a schema where you force 7:14the schema in the API to give a 7:17statement, a confidence level, a source, 7:19and a verification status. You can do 7:21this with plain English prompting. I 7:23find that it tends to look a lot like 7:25chain of verification prompting in 7:27practice where you are naming the 7:29conditions under which the prompt can 7:31move forward and only then are you going 7:34to allow the model to print a response. 7:36When you're tackling hallucinations in 7:38particular, you do want to get specific 7:41with the conditions for confidence 7:43level. So if you're saying only print an 7:45answer when your confidence is high, are 7:47you clear with the model about what high 7:49confidence looks like? and are you 7:51extremely unambiguous about that? This 7:53is an area where even a little bit of 7:55ambiguity in your prompt can push the 7:57model to resolve on its own and that can 7:59lead to hallucinations. Office hours 8:01problem number five is the drift problem 8:03or consistency issues where you have 8:05same inputs and different outputs. Uh 8:08you can have generated tags or 8:10categories that drift across runs. You 8:12can have selection criteria applied 8:14inconsistently. So the baseline fix is 8:17to turn the model temperature down if 8:20you're in the API and to set absolute 8:22constraints. If you are in the chat, the 8:25best thing you can do is to be extremely 8:28obsessive about token level clarity and 8:31consistency with your inputs and to also 8:34be extremely specific with the rules by 8:38which the model processes what you want 8:40it to do. If you want to take an Excel 8:42sheet and process it to a Google doc, 8:44the prompt that you're using, maybe it's 8:46not in the API, maybe it's in an N8 8:48workflow, maybe you're actually pasting 8:50it into Chad JPT, but the prompt needs 8:52to be extremely specific about the 8:54sequence of steps the model needs to go 8:56through. Otherwise, you get into a 8:58position where you are inviting the 8:59model to make up steps along the way, 9:01and that invites drift. It invites a 9:04lack of consistency. Your job with this 9:06consistency issue is to take away all 9:09ambiguity and opportunity for creativity 9:11from the model and to make it as 9:13buttoned up and linear as you possibly 9:15can. The sixth one that I get a lot is 9:17the cognitive bandwidth trap. Too much 9:20context. More context can make outputs 9:23worse, but people don't realize that 9:26when they're early on in chat GPT, and 9:28they think that they can just throw 9:30everything into a million token context 9:32window and get responses. The fix is to 9:34have really clean context loading. If 9:37you are in the API, you can specify 9:40required versus optional context 9:42loading. If you are not in the API, it 9:44is your brain people. You need to think 9:47about the slice of context you are 9:49uploading and you need to assume that 9:51you would do better to paste the two 9:54pages of the 20page brief that you need 9:56edited or reviewed versus the full 9:5920page brief. Unless there's a real good 10:01reason to give the model all of that 10:03context. Default to as little context as 10:07is humanly possible to get the model to 10:10do the task. This is really confusing 10:12because people keep looking at these 10:14exploding context windows and saying, 10:15"Oh my god, this is amazing." Well, it 10:18is amazing, but if you want clean, 10:20consistent outputs, overloading the 10:21model with context is going to be a 10:23problem. And so my challenge to you is 10:25to start to think about context as 10:28something that you clean for, not 10:30something that you accumulate. Because 10:32the more you think of it as a pile you 10:34accumulate, the more you invite the 10:35model to use dirty context. So there you 10:39go. I know I hit hard. These are the six 10:42most common problems that I see in real 10:45life. I also comb through the OpenAI 10:47forum. So I wanted to check and see, am 10:49I just getting a sampling size issue? 10:51I'm not. I am seeing these over and over 10:52in the open AI forums as well. They seem 10:55to be something that developers face and 10:58non-developers face and I wanted to 11:00pretend we were in office hours together 11:02and just give you an honest talk about 11:04what I see and how you actually fix it. 11:06If you want to dive deeper, there's a 11:08ton more in the write up on Substack. 11:10There'll be prompts and just really easy 11:13specific examples so you can see what I 11:14mean in terms of how to fix this stuff. 11:16Good luck. These models are much more 11:18capable than we think. Um, and you're 11:20not alone when you're facing problems. 11:21That's the other thing I want you to 11:22take away.