Learning Library

← Back to Library

Four Core Moves for Prompting

Key Points

  • The speaker is consolidating a year’s worth of prompt guides into a structured course that offers a beginner‑friendly pathway, an advanced track, and a “jump‑in” option for experienced users.
  • Prompting is framed as briefing a contractor: you must clearly define the desired deliverable’s shape, format, and constraints to get consistent, useful results.
  • The first core move is to specify the output shape (e.g., word count, bullet points, tables, checklists) so the AI knows exactly what form the answer should take and avoids unwanted filler or style quirks.
  • The second move (introduced but not detailed) emphasizes providing just enough context to guide the AI without overwhelming or confusing it, ensuring the response stays on target.

Sections

Full Transcript

# Four Core Moves for Prompting **Source:** [https://www.youtube.com/watch?v=UhyxDdHuM0A](https://www.youtube.com/watch?v=UhyxDdHuM0A) **Duration:** 00:30:14 ## Summary - The speaker is consolidating a year’s worth of prompt guides into a structured course that offers a beginner‑friendly pathway, an advanced track, and a “jump‑in” option for experienced users. - Prompting is framed as briefing a contractor: you must clearly define the desired deliverable’s shape, format, and constraints to get consistent, useful results. - The first core move is to specify the output shape (e.g., word count, bullet points, tables, checklists) so the AI knows exactly what form the answer should take and avoids unwanted filler or style quirks. - The second move (introduced but not detailed) emphasizes providing just enough context to guide the AI without overwhelming or confusing it, ensuring the response stays on target. ## Sections - [00:00:00](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=0s) **Untitled Section** - - [00:03:33](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=213s) **Guidelines for Fact‑Based AI Prompts** - Advice on supplying labeled factual inputs and instructing the AI to respond only with those facts—or say “unknown” when unsupported—to minimize hallucination. - [00:06:56](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=416s) **Prompting for Concise, Self‑Checked Answers** - The speaker explains how to direct a language model to truncate its internal reasoning, provide a brief answer with a top recommendation and justification, and perform a quick quality check before delivering the response. - [00:10:07](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=607s) **Guiding AI Output Structure** - The speaker explains how explicitly defining the desired output format, criteria, and level of detail in prompts enables the model to produce customized decision tables and incident summaries while allowing flexibility to omit or adjust elements based on the user's priorities. - [00:13:42](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=822s) **Prompt Engineering: Prioritize Pipelines** - The speaker argues that effective prompting starts with designing the full retrieval, tool, and memory pipeline—as a first‑class object—since prompts behave differently across environments, and stresses treating all input context as a trust‑bounded supply chain to ensure safety and reliability. - [00:17:08](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=1028s) **Scaling Prompt Design Principles** - The speaker explains that clear contracts and output schemas act as scalable, fractal prompts—using entropy settings like temperature and top‑p plus constraints to shape a model’s probability distribution. - [00:21:01](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=1261s) **Treat Prompts Like Production Code** - The speaker stresses that prompts require the same rigorous testing, monitoring, versioning, and rollback processes as production code, accounting for wide user distributions and embracing multiple models for robust performance. - [00:24:09](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=1449s) **Governance Over Heroic Prompting** - Effective AI prompt production relies on simplicity, versioning, testing, and built‑in safety through structured governance rather than ad‑hoc heroic efforts. - [00:27:27](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=1647s) **Managing LLM Memory and Enforcement** - The speaker explains how product choices shape LLM context windows, why retrieval‑augmented architectures are needed to handle memory, and that automated output checks outperform relying on human vigilance. ## Full Transcript
0:00So, I have never done this. I went back 0:02through all of the prompt guides I've 0:04written over the last year plus and I 0:07went through them. I collated them. I 0:09pulled them together and I'm putting it 0:12into a comprehensive 0:14prompting course. But that's not all. 0:17I'm also separating it out so that it's 0:20got a really easy beginners pathway into 0:24prompting. I'm going to lay out the 0:26first steps of that here in the video. 0:29then a more advanced guide for when 0:32you're ready. And for folks that want to 0:33jump right to it, there's lots there. 0:36So, this video is going to be very 0:37simple. We're going to go through the 0:40four moves that I want to sort of call 0:43out as the heart of prompting if you're 0:44still sort of trying to wrap your mind 0:46around what prompting is and why it 0:48matters. And then in the second part of 0:49the video, I'm going to give a little 0:51teaser trailer to the folks who are 0:53interested in advanced prompting 0:55techniques. and I'm going to talk about 0:58the patterns that I uncovered digging 1:01through hundreds of pages of prompting 1:03notes and guides that I have written 1:04over the last few months. What is it 1:06that stands out? What is it that's 1:07consistent? I found some really 1:09interesting things. So, let's start with 1:12the beginner friendly accessible moves. 1:15Four big moves. The big idea here is 1:17very simple. Instead of thinking of 1:18having a casual chat with an artificial 1:20intelligence, instead of thinking of 1:22just chatting with chat GPT, I want you 1:24to think of it like you are briefing a 1:27contractor on exactly what you need. And 1:29you have four simple moves to do that. 1:31And this is actually it maps really well 1:33to talking with a contractor. So you 1:35will get more consistent and useful 1:36results if you start with, for example, 1:39move one. Tell your contractor, tell the 1:42AI what shape you want the deliverable, 1:46the thing it's going to write to be in. 1:49Think of it like ordering at a 1:51restaurant. If you are not specific 1:53about what you want on the plate, if you 1:56don't say, "Hey, I don't want pastrarami 1:58on the sandwich." Well, you're going to 2:00get pastrami on the sandwich. 2:02So instead of saying write something 2:04about X, tell the AI exactly what format 2:09you need. As an example of the kind of 2:12output shape that you can request, 2:15give me one paragraph between 110 and 2:18130 words. I don't want any headings. 2:20It'll do it. 2:22Write exactly five bullet points, one 2:25sentence each. It'll do it. Create a 2:27simple five row comparison table. It'll 2:30do it. Make a checklist with only six 2:32items. You get the idea. You can be 2:34extremely specific. And this is actually 2:36a way to contain some of the more 2:39annoying response patterns that some of 2:41these AIs have. So, for example, if 2:43you're tired of Claude telling you all 2:46the time how right you are and 2:47absolutely, you can just say no LLM 2:51bumpers. Go away. I just want your 2:53response. If you are tired of Chad GPT5 2:56writing super long responses, you can 2:58say, "You got to give me the answer in 3:00150 words or less." There are ways you 3:02can have more control here than you 3:05might think. Move two, 3:07give just enough context to your 3:10contractor or AI. Not too much. This is 3:13like giving someone a recipe with all 3:15the ingredients they need and not too 3:18much. It's basically like a HelloFresh 3:20box if you've ever used it. It ships you 3:22the recipe and it ships you all the 3:24ingredients. Not any not anymore. Just 3:27just the ingredients you need. 3:30Tell the chat, tell the AI to use only 3:33the specific facts that you are 3:35providing. Label them so that you can 3:38track where they came from. If you're 3:41really want to be sure that it got it 3:42right, right? So, for example, if you 3:45were trying to get it to generate a 3:46response that's really clean and clear, 3:49you can say, "Here are some facts. Fact 3:51one, our customer turn increased from 3% 3:53to 5% last quarter. Fact two, most of 3:56our cancellations are from small 3:57accounts. Fact three, our competitor 3:59just launched a cheaper monthly plan. 4:01Use only these facts. If something isn't 4:04covered here, please say unknown instead 4:06of guessing. This is helpful because you 4:11are less likely, not guaranteed, not 4:14foolproof, but you are less likely to 4:17have the chat to have the AI make things 4:20up when you explicitly tell it that it 4:23has space to say unknown. There are lots 4:25of other ways to do this that I've 4:26suggested in the past. You can say 4:28inference as a label. You can suggest 4:31that it asks you questions instead of 4:33trying to make stuff up. Really your 4:35choice there depends on the degree to 4:37which you want to give it more 4:38information to fill in those gaps versus 4:42you were giving it all it has and it has 4:44to make the best assessment it can based 4:46on the information because that's all 4:47you got. And so you have to decide like 4:49how much information are you giving it? 4:51How much clarification do you want to 4:52offer it as a chance? Fundamentally 4:55though the idea is give it the context 4:58it needs. Don't give it too much. If you 5:00want to be really clear about what's in 5:02the box, you can label that context. You 5:05don't have to. You can also just list 5:06the facts and say, "These are facts. I 5:08want you to use these facts. Please do 5:11not guess. 5:13Please, please use another alternative 5:15method because just saying do not guess, 5:18the model wants to be complete and 5:20helpful. So, you have to sort of give it 5:21another way to be complete and helpful." 5:23And so saying label it unknown that 5:25would be much more helpful is a way to 5:26sort of let the model know to like track 5:28into that helpfulness vector that it has 5:30that's very strong. Okay. Move number 5:33three. Give your contractor or AI a 5:36simple behind the scenes plan. 5:40It's it's giving someone driving 5:42directions. It's telling them the route, 5:45but you don't need them to narrate every 5:47turn. And so you can actually suggest to 5:50the AI these are some steps you can take 5:52but you don't have to tell me about 5:53them. And that gives the AI some 5:57guidance on the tool calling some 5:58guidance on the process it uses that is 6:01actually very helpful. So as an example 6:04silently follow these steps. List the 6:07options compare them. Recommend only one 6:10then show me the final comparison in 6:13your head. Prioritize by importance. Add 6:17time estimates, assign owners, just show 6:19me the final agenda. You see how these 6:21are slightly different wordings, but you 6:22see the same idea behind them. You don't 6:25actually need to see the thinking 6:27process here. The whole like, please 6:29show me your thinking was an artifact of 6:32a pre-reasoning model era. We have 6:34reasoning models now. They will do 6:36reasoning. All you're doing is giving 6:37them some railroad tracks to guide them 6:40on that reasoning process and keep them 6:42moving in the direction you want. And 6:45that's helpful. So, by the way, if the 6:48AI comes back and it shows you the 6:49thinking process and you find that 6:51unhelpful, you can just say, you know 6:52what, please don't show me the plan. 6:54Just show me the result. 6:56And if it gets complicated and it comes 6:58back with a really long answer, that's 7:00either you didn't specify the result or 7:03you can say, you know what, 7:05please keep your internal thinking to a 7:07shorter timeline, right? I don't have 7:08time forever. Like, and this is 7:10something that we're seeing more and 7:11more in some of these interfaces that 7:13are coming out. Like if you look at 7:14perplexity, you can click get the answer 7:16now. Very similar idea. You basically 7:18are telling it cut short whatever you're 7:20doing and come back with a response 7:21because I actually don't care that much 7:23about how in-depth your thinking is for 7:24this problem. And you as the human get 7:26to decide that you get to say, yeah, I 7:28don't care, right? I don't care that 7:29much. And then this is a really 7:31interesting one. If you go through this 7:33whole thing, do the behind-the-scenes 7:35plan, right? the silent plan and there's 7:37no clear conclusion. Just add end with 7:40your best recommendation. 7:42And why? Because sometimes it's going to 7:45come through and like you have to again 7:47give the model some help to be helpful. 7:50Like it wants to be helpful. Give it a 7:53sense of what to do in that ambiguous 7:54situation. Pick the best one. Tell me 7:57why. Okay. Move four. It is helpful to 8:01add a quick quality check even in a 8:03short prompt. It's like having the model 8:06proofread itself a little bit before it 8:08comes back to you. It's like asking the 8:10contractor to check their work. 8:13Here's some examples of how to phrase 8:14it. Before showing me your response, 8:17hey, can you check are there actually 8:19five bullets? If not, fix it. Verify 8:21every claim has a fact number or verify 8:23each claim is based on the facts I gave 8:25you or that it says unknown. Confirm 8:29that this paragraph is between 110 and 8:31130 words. Check your work. Now, you 8:33might think, well, those are really 8:35silly examples you're giving me, Nate, 8:37but they're memorable and you can use 8:40them however you want. You can specify 8:42the output shape that you want. You can 8:44specify how to check. You don't have to 8:46check the word length if that's not 8:47important to you. You can check for 8:48content and topics. The point is ask for 8:52a check. It can help the model to do a 8:55second pass and address formatting 8:58issues. It can help address uh 8:59hallucinations and mitigate them to some 9:01extent. It can help with missing 9:03elements. It can help with incorrect 9:05length. It can help with uh complexity 9:08issues if it's too complicated. 9:10You just have to ask it to check its 9:12work in line with the original goal that 9:14you gave it. So, let's look at some 9:17complete examples. So, here's three 9:19complete examples. You have an email 9:21outline and all you're doing is you're 9:23saying, "I want you to outline an email, 9:25too." And you can insert the audience. 9:26You have the topic. I want you to 9:29actually format this as an email with 9:31five bullets. And I and you're very 9:32clear about what's in here, right? 9:34There's a hook to this. There's why it 9:35matters. There's two points and an ask. 9:37That's five bullets. It knows how much 9:39it has. And you want to check it that, 9:42you know, if it doesn't have five 9:43bullets, it has to fix it. You can get 9:44an explainer prompt that works this way, 9:46too. Right? The task is to explain 9:48whatever concept. Maybe it's LLM token 9:49architecture, right? Explain it as if 9:51it's to a smart 12-year-old. You have 9:53three parts with headings. what it is an 9:55example and gotchas or 9:57misunderstandings. You have limits. It 9:59has to be less than 140 words and check 10:02that all three headings are present 10:03because you want to make sure that all 10:04that's what you want to check, right? 10:05That all three of those are present. A 10:07decision helper. Compare four options 10:09for this decision I'm facing. Compare on 10:12price, setup, learning curve, ongoing 10:13effort. You could really define it any 10:14way you want. This is the shape of the 10:16output. It's a four row table. And end 10:18with your recommendation one sentence 10:21pick and why. Please check that the 10:23table has rows and columns. 10:26Pretty intuitive, right? And one of the 10:29things that you'll notice is in each of 10:31these I am giving the model space to 10:35process how it wants. That is not always 10:38true. It does underline one of the core 10:40principles here. If you are 10:42communicating your intent really, really 10:44clearly, you can choose to drop some of 10:47these. You can choose to say, you know 10:49what, the shape doesn't matter. You can 10:51choose to say, "I don't care about the 10:53check." And that depends on you deciding 10:56what matters and what doesn't for the 10:58work that you're trying to do. Let me 11:01give you a few more examples. Now, these 11:03examples are just a little bit more 11:05advanced. And I want you to notice that 11:06we've decided to be more specific. We've 11:08decided to have some silent process 11:10here. An incident snapshot. Let's say 11:13something goes wrong at work, right? You 11:14want to describe what happened and what 11:16went wrong. You include some notes here 11:18that you label. You specify what the 11:21shape of the output is for executives. 11:23You say this is the process because you 11:25really do care that it actually looks 11:27through the facts. It quantifies the 11:28scope. These are things you need it to 11:29do to know that it looked at all the 11:31notes. Um and then you specify 11:34that there is an owner, right? It has to 11:36check that there's an owner. It has to 11:38be clear if there's something that's 11:40unknown. Uh so that you're not 11:42overclaiming. 11:44What you're trying to do here is 11:45essentially take a fairly complex 11:48normally human task and actually put it 11:50into a short prompt that you can get a 11:53very reliable initial pass on for a high 11:56stakes action. So it makes sense that 11:59each of these things actually specified 12:02in the same way. Uh let's say you want 12:04an action plan from a highv value 12:05meeting, right? You have a shape. You 12:07have a checklist with boxes that you 12:09want for your action plan. You have 12:11notes that you've included. Your silent 12:13process is really around extracting the 12:15tasks, ranking them, making sure that 12:16you get the right tasks. And you want to 12:18make sure that you have exactly seven 12:21boxes and that every line has an owner 12:23and a due date, and that you're not 12:26duplicating tasks. So, that's what you 12:28want to check. You get the idea here, 12:29right? You can do a more complex 12:31assessment for higher value tasks. And 12:33you can see here the the example from 12:34the title for a for a video post. It's 12:36very similar. I won't go into it in 12:38depth. The point is this format is 12:41flexible enough to get somewhat more 12:43specific with a higher value task or it 12:46can be lighter like I showed you 12:48initially. Now, we are going to do a 12:51little bit of a jump forward and we're 12:53going to talk about some of the 12:55prompting patterns that I learned when I 12:58was digging through my own notes and 12:59exploring how prompting has evolved in 13:012025. These are things that are 13:02non-obvious that I don't see discussed 13:05online. So, hold on to your hats. All 13:08right. What are the nonobvious, 13:11underdised principles that came out to 13:13me as I sat here and I was putting 13:15together these guides, these this 13:17in-depth prompting guide that I've been 13:19putting together for you. Number one, 13:22the unit of design. We think of it as 13:24the prompt, but the more I stared at it, 13:27the unit of design is the pipeline, not 13:29the prompt. So we think about prompts as 13:33like things we tell the chatbot, but 13:35really even if you're in the chatbot, 13:37the prompt is living inside a structure. 13:40It's living inside an architecture. So 13:42there's things like retrieval or maybe 13:44tool calls or memory or evaluation. 13:48When you treat each of these as isolated 13:51artifacts, the reliability of the prompt 13:53falls apart. If you build the pipeline 13:56first, then you can write prompts to 13:58fit. And so a lot of my work when I'm 14:00sort of building prompts, helping people 14:02move through from beginner to 14:03intermediate to advanced stages of 14:05prompting is really understanding the 14:06pipeline they're operating in. And for 14:08many people, the default pipeline is 14:10just the chat GPT user interface. It's 14:13all defined, right? Whatever you have, 14:14it's in the interface there. If you're 14:16building your own, all of that stuff is 14:19up for dialing and adjustment, right? 14:21And in in those circumstances, you are 14:23designing the prompt after you design 14:26the pipeline. the pipeline is the first 14:28thing you think about. And this is part 14:31of what makes prompting so difficult to 14:33teach because if I give you a prompt 14:35insight, it may work really well in a 14:38particular pipeline environment but not 14:41as well in another one. So think about 14:44your pipeline as a first class object. 14:46Second thing that came out to me, 14:49context is a supply chain with trust 14:53boundaries. 14:55So every token that you feed the model 14:58comes from somewhere. Even in a short 15:01prompt, user input, docs, the web, it is 15:04really important that you think of that 15:07context as having trust boundaries. If 15:10you care about things like safety, 15:12security, prompt injection, if you care, 15:15which you should by the way, if you care 15:18about reliability and avoiding 15:20hallucinated responses or inaccurate 15:22responses, 15:24the way you do that is by making sure 15:27that you understand that you have 15:30trusted resources, somewhat less trusted 15:33resources, and unsafe or untrusted 15:36resources. 15:37Now, if you're building a complex 15:39pipeline, you may actually directly 15:41label that stuff. If you're building a 15:43high-value prompt, you may directly 15:45label that. I do that sometimes. I'm 15:47like, "These notes are very reliable. 15:49These notes are not reliable." Right? If 15:51you are in the middle of a casual 15:54prompt, you may actually just label it 15:56in the middle of the chat. But 15:58regardless, 16:00you need to recognize that when you are 16:02working with context, you're basically 16:05loading in a supply chain of tokens. And 16:08the more you can indicate trust this, 16:10don't trust this, the more likely you 16:13are to get higher quality, higher 16:15reliability responses out of your third 16:17one, I have mentioned this before, it 16:20has been a while. Contracts really 16:22matter. And contracts are basically a 16:24fancy way of saying format matters and 16:28pros come second. And so if you are 16:31going to 16:34be specific about the outputs that you 16:36expect, if you are going to frame an 16:38interaction as we are forming an 16:40agreement together, we are forming a 16:42contract together. That is going to go a 16:45long way because just as I said it in 16:47the first part of this video, you're 16:49working with a contractor. That's the 16:51mental model. You want to have a 16:52contract with your contractor and you 16:54want to make sure the outputs are 16:56specified. Now some people go overboard 16:58here and I have seen the hype around the 17:00internet where they say well JSON is the 17:01only way to prompt and that's it's not 17:04well supported by the documentation. 17:06Certainly models can read JSON but they 17:08can read lots of other things too. If 17:09anything the way to think about it is 17:12that you want clarity. You want clarity 17:14on your expected format and outputs and 17:17that in turn helps improve reliability 17:19because the model knows what you want. 17:21And so contracts are just a way of 17:23encoding your intent. And if you have a 17:25more complex pipeline, your contracts 17:28get suitably more complex. But this idea 17:30scales, right? Like all of these 17:31insights are designed to scale. One of 17:33my hypotheses, if you want a little 17:35sidebar from Nate, is that prompting is 17:37fractal. Like you have a simple version, 17:39but it's fractally related. It's it's 17:41essentially the same thing in miniature 17:43as a more advanced version of prompting. 17:46And a lot of these insights I found are 17:48fractal. to scale. 17:51Number four, entropy 17:53is a design variable. Just I know, sit 17:56down, have some coffee. We we're getting 17:58deep in here. So, if you understand 18:02about prompting, 18:04you understand that you can shape things 18:06like temperature and top P, which if 18:08you're a beginner, you don't worry about 18:09that's preset in chat GPT. If you're 18:11more advanced, you can set that in the 18:12API and it constrains the probability 18:16distribution. But you can also use 18:18constraints, you can use examples, you 18:20can use output schemas and those further 18:22narrow the distribution, right? You're 18:23still shaping the distribution. So the 18:24the larger the larger insight here is 18:28that your entire goal is to shape the 18:31probability mass. 18:33That's what you're doing with with 18:35constraints, with examples, with 18:36context, with output schemas, with 18:38temperature, with top P. You're all 18:40shaping the probability mass. You are 18:42using entropy as a design variable. 18:44You're not just making it more creative. 18:46you're actually shaping the probability 18:48mass of the outcome. Now, that may seem 18:50really abstract, right? It it may seem 18:52theoretical, but I think having correct 18:54mental models helps us to actually 18:57decide where to apply leverage. And this 18:58is definitely an advanced concept, but 19:01if you understand that temperature, top 19:03P constraints, examples, outputs, all of 19:05that is related. It helps you to 19:07understand where to apply leverage in 19:09the system because you can say well the 19:10output schema is probably going to be 19:12more effective on probability mass here 19:14versus temperature because X or Y. So 19:18just take that think about it go for a 19:20walk think about it some more and it may 19:22start to click with troubleshooting 19:24advanced prompts. 19:26Number five scaffolding often matters 19:29more than just horsepower. So if you are 19:32if you're in the API, you you can just 19:34like burn tokens on stuff and that can 19:36be helpful. But if you can give the 19:38model techniques, things that I cover in 19:40the advanced prompting guide that that 19:42is sort of in the substack here, like 19:45least to most tree of thought, stepwise 19:47plans, you're essentially giving the 19:49model a way to reduce cognitive load. 19:52You're giving it something that helps it 19:53with error accumulation, and you're 19:55adding structure, right? You're giving 19:57it a scaffolding for how it thinks. And 19:59that scaffolding makes it more token 20:02efficient. 20:04And so the little meme that I keep in 20:06mind is that scaffolding beats 20:08horsepower. 20:09Have good scaffolding. And yeah, 20:11horsepower is inevitable, right? Like 20:13there's a reason we talk about burning 20:15tokens as a way to solve problems. But 20:17in a world where we need to be token 20:19efficient, which is especially true in 20:20production prompting environments, 20:23have good scaffolding. Scaffolding 20:25matters. And I definitely get into that 20:27in the advanced uh sort of prompting 20:28section. And so if you want to dive in, 20:29there's a lot there. Number six, 20:32shifting the distribution 20:34is enough to break your best prompt. So 20:37if prompts are tuned on a handful of 20:39examples, they will drift in the wild if 20:42they face a wildnormed distribution, 20:45which is a fancy way of saying if you 20:47train your prompts and you say this 20:48prompt works well in the studio in the 20:51environment and then you release it into 20:52the wild and it's dealing with consumers 20:54and they're asking all kinds of things, 20:57it's going to break your best prompt. 20:59This is another way of saying I have 21:01seen the debates on evals. You 21:04absolutely need to take quality 21:06seriously in production. You need to 21:07have the ability to test it. You need to 21:09have the ability to monitor it. You need 21:11to have the ability to evaluate it and 21:13roll it back and version it and treat 21:14the prompt like code. The wild 21:17distribution is good enough to break the 21:20best prompt. And the way to address it 21:22is to treat prompts like production code 21:25that need sustained investment to be 21:28optimized over time. 21:30You can't assume you can just release it 21:32and it tested well. And again, the 21:34mental model matters here. If you think 21:36about a wild distribution of everyone 21:39trying to sort of send queries against 21:41your chatbot, well, that's going to 21:43matter. Like that's going to break it. 21:45You can understand why that breaks it 21:46and you can understand why that's 21:48different from a lab grown distribution 21:49which tends to have narrower tails. And 21:52that in turn can shape your evals. Your 21:54eval should push your your distribution 21:56as wide as you can. Part of your goal is 21:58to get your lab grown distribution to 22:00stretch and to actually be more 22:02effective. Number seven, model pluralism 22:05is a feature, not a bug. Different 22:08models really do have distinct 22:10personalities and strengths. You feel it 22:11more the more you work with them. It 22:14really, really matters that you not 22:16build your architectures assuming only 22:18one model or assuming one model will do 22:20it all or assuming that you only need 22:22one model for now, etc., etc. This is 22:24one of those things that marks the 22:27boundary between a beginner view of AI 22:29and a more advanced view of AI. The 22:32farther along you are in building sort 22:34of with AI, 22:36the more you recognize that you are 22:38building with multiple models and people 22:40have like Claude and Chad GPT and Gemini 22:43for this and this and they and they'll 22:44recite off the top of their head all the 22:45use cases they have and which models 22:47they use and if you're building 22:48production pipelines, you'll know which 22:50versions of those models you're going to 22:52use and why. Model pluralism is not only 22:54a feature, it is the future. You should 22:57expect to have a pluralistic model 22:59environment. And the more advanced you 23:01are, the more models you're going to be 23:02using, not because you love complexity, 23:04but because you love efficiency and you 23:07can pick the right model for the right 23:09task. Insight number eight, 23:12the farther you get with prompting, the 23:14more you recognize that economics are a 23:15first class constraint. You know, Satcha 23:17talked about dollar cost per token per 23:20watt as the equation for the next 10 23:22years. And he was right. Token budgets 23:24matter. Latency matters. Fallback logic 23:27matters. The more you look at 23:28architectures, it's less about picking 23:30cool tools and it's more about making 23:33sure that you have a reliable, scalable, 23:36efficient architecture. thinking about 23:38how loads actually hit models, what 23:43models handle what loads, how models 23:45pass and make tool calls at scale. 23:49Those all start to matter. What is the 23:51cost of the model making those choices? 23:53Why do we need it to make that choice? 23:55Is there a simpler way we can design 23:57this? This is actually one of the great 23:59examples where I see strong engineering 24:02instincts from humans handily beating 24:05LLMs at the moment. 24:07Humans are really good at designing 24:09efficiently engineered systems and 24:11models tend to be good at adding 24:13complexity. And when you are trying to 24:15design a system that keeps economics in 24:18mind, you have to design for simplicity 24:20first. 24:22Number nine, governance beats heroics. 24:25Heroics are a fancy way of saying people 24:28are out in production and they're 24:29desperately trying to do their very best 24:31and they're trying to write the best 24:32prompt by next Thursday, etc., etc. No. 24:35Have good governance. have versioning. 24:37You can AB test changes. You can 24:39evaluate with rubrics. You can log 24:41things. Your prompt library ends up 24:43being intellectual property. Manage it 24:46as if it was code. Govern it like it was 24:48code. And this is a different way of 24:51talking about the same thing I talked 24:53about earlier. Distribution shift will 24:55break your best prompt. That was really 24:57about how you care about testing and 24:59evaluating. This is really about the 25:01governance structure that keeps 25:03production prompts going. Governance 25:04beats heroics. Just like we say in 25:07engineering circles, don't play hero 25:09ball. If it is up to you to play hero 25:12ball to sustain this code in production, 25:14it's not good code. Similarly, if it's 25:16up to you to sustain this prompt in 25:18production, it's not a good prompt. You 25:20need good governance first. Number 10, 25:23safety gets designed in like at core. 25:27It's not added on. So, constitutional 25:30system level rules, which I talked about 25:31in the advanced guide, refusal styles 25:34and how you handle it, how it handles 25:35ambiguity, how you address jailbreak 25:37attempts and prompt injection attacks, 25:39output moderations. That's all part of 25:42the spec, right? That's not an 25:43afterthought. You think about that from 25:45the beginning because you have to assume 25:48that models are unsafe by default. And 25:51the only way you make them safe is by 25:54designing them to be safe from the 25:57get-go. And so again, this is one of 26:00those things that really distinguishes 26:01advanced prompters from beginning 26:03prompters. Beginning prompters operating 26:04in a chatbot, the chatbot is pretty what 26:07we would call nerfed, right? It's it's 26:09like whatever chat GPT gives you, 26:10whatever Claude gives you, you can 26:12bounce off those walls, but you're not 26:14going to get very far. If you're 26:16designing your own system, it's very 26:17different. You are responsible for all 26:19of that safety stuff, and you have to 26:20take it seriously. 26:23Number 11, memory is not a toggle. 26:26Memory is a product choice. Deciding 26:28what persists, where it's stored, how 26:30it's summarized and validated, what the 26:32model remembers. You have to 26:36assume that the context window is 26:38nothing. You have to assume that memory 26:41or statefulness is something you want to 26:43design into the architecture of the 26:45system. And context windows are just 26:47ways to generate a particular response. 26:50People who assume the context window or 26:52the model remembers is going to work, 26:54it's not going to work. 26:56Even if it's a very large context 26:57window, it is only useful for that 26:59response. Keep in mind models are 27:01reinforcement learned. They're trained 27:03on basically single response patterns. 27:06Whatever you give it, the ideal response 27:09from that model is one 27:11one. Now, some people will go longer, 27:14but this this key idea explains why 27:18beginners are often surprised when they 27:20continue a chat and then they get 30 40 27:23turns in and the model seems to forget. 27:25That's just how it works, right? The 27:27product choice that chat GPT has made 27:29and other model makers have made is to 27:31have like a rolling context window in 27:33certain situations and so the model will 27:35keep talking and roll the context window 27:37and it will forget the initial thing. 27:38Now, Claude has made the choice not to 27:40do that. Notably, Claude has decided 27:42they're just going to end the chat. That 27:44also frustrates people. There is no good 27:46answer here. 27:48The only way that people who build 27:50advanced prompt architectures get this 27:52to work is by 27:54deliberately architecting what is stored 27:57and what is summarized and validated and 27:59how it's retrieved. This is where we get 28:00into the discussion of rag, right? 28:01Retrieval, log generation, other ways, 28:03chunking data, etc. 28:06Memory is a product choice. Don't assume 28:08memories in the context window. The 12th 28:11and last insight, 28:13automated enforcement beats human 28:15vigilance. Do not trust that the model 28:18will follow your rules. Install 28:20automated checks. In larger systems, you 28:23actually have separate LLMs that are 28:25checking for specific elements of the 28:27output and then coming back. And they're 28:30cheaper, right? They're often dumber 28:31LLMs and they check for specific things. 28:33Is it in a bulleted format? Right? Is it 28:36reflecting the style guide? Whatever it 28:37is, right? You want to make sure that 28:40you have automated enforcement checks 28:42because that beats the best human 28:43diligence. It beats the best human 28:45evals. 28:47The more you can build enforcement of 28:49the schema, enforcement of the output 28:51into the system. So, it's just part of 28:53the system. You're not playing hero 28:54ball. It's just good engineering. And 28:56that's really where I want to leave you. 28:58So many of these insights as I stared 29:00and I'm I'm not kidding. It was like 500 29:02pages of like prompt stuff I've written. 29:04I've been pretty prolific. And I keep 29:05looking at this and I'm like, you know 29:07what? Really what we're doing is we are 29:10pulling so many of the principles of 29:13good software engineering forward and 29:15we're talking about them in the context 29:17of artificial intelligence system 29:19design. 29:20And the principles are not new. 29:22Governance beats heroics. That's not 29:23new. Google knew that a long time ago, 29:26but we have different ways to apply it 29:28in the AI age. And people forget it 29:30because they think that you can just 29:31talk to the AI and make it do anything. 29:33So it's worth repeating. It's worth 29:35reiterating. So there you go. Those are 29:37the 12 things I noticed. I hope you dig 29:40in. I hope you enjoy the advanced guide. 29:42If you're still here listening to these 29:4312, you probably want the advanced 29:45guide. It's likeund something pages. 29:47It's it's really in-depth. I had fun. I 29:50basically took everything that I learned 29:53over the last six or eight months. And 29:55people kept saying, "Nate, Nate, Nate, 29:57Nate, I'm tired of looking at all these 29:59guides. Can you just give me like one 30:01thing? This is the one thing. This is 30:03the one thing. This is the guide, right? 30:04Like this is the soup to nuts a toz 30:07complete guide for prompt engineering as 30:10it exists today in September 2025.