Learning Library

← Back to Library

Inside Claude 4 System Prompt

Key Points

  • The speaker examines a leaked Claude 4 system prompt, emphasizing that the value lies in its structure and policy‑focused design rather than confirming its authenticity.
  • Unlike typical prompts that prioritize “what the model should do,” this prompt flips the ratio to ~90 % defining prohibitions and only ~10 % specifying desired actions, aiming to prevent failure modes.
  • Key tactics identified include: (1) instantiating a stable identity and context early to ease the model’s working memory; (2) using explicit “if‑then” trigger blocks to handle edge‑case refusals; and (3) employing a three‑tier uncertainty‑routing strategy to guide how the model deals with ambiguity.
  • The analysis argues that focusing the majority of prompting effort on clear constraints and edge‑case handling—rather than on instructions alone—yields more consistent, high‑quality outputs for both API users and chat operators.

Full Transcript

# Inside Claude 4 System Prompt **Source:** [https://www.youtube.com/watch?v=74FvsJeljak](https://www.youtube.com/watch?v=74FvsJeljak) **Duration:** 00:10:59 ## Summary - The speaker examines a leaked Claude 4 system prompt, emphasizing that the value lies in its structure and policy‑focused design rather than confirming its authenticity. - Unlike typical prompts that prioritize “what the model should do,” this prompt flips the ratio to ~90 % defining prohibitions and only ~10 % specifying desired actions, aiming to prevent failure modes. - Key tactics identified include: (1) instantiating a stable identity and context early to ease the model’s working memory; (2) using explicit “if‑then” trigger blocks to handle edge‑case refusals; and (3) employing a three‑tier uncertainty‑routing strategy to guide how the model deals with ambiguity. - The analysis argues that focusing the majority of prompting effort on clear constraints and edge‑case handling—rather than on instructions alone—yields more consistent, high‑quality outputs for both API users and chat operators. ## Sections - [00:00:00](https://www.youtube.com/watch?v=74FvsJeljak&t=0s) **Untitled Section** - - [00:03:28](https://www.youtube.com/watch?v=74FvsJeljak&t=208s) **Decision-Based Prompt Engineering** - The speaker outlines how robust prompts should embed decision criteria to route timeless, slowly changing, or live information appropriately and provide both correct and incorrect API call examples (“lock tool grammar”) to guide the model’s behavior. - [00:07:23](https://www.youtube.com/watch?v=74FvsJeljak&t=443s) **Reinforcing Prompts and Tool Reflection** - The speaker outlines a prompting pattern that interleaves core instructions with repeated privacy reminders and adds a post‑tool “thinking” pause to boost both model memory retention and output accuracy. - [00:10:32](https://www.youtube.com/watch?v=74FvsJeljak&t=632s) **Evaluating Prompt Impact and Utility** - The speaker discusses how a consistent condition (X always Y) would change prompting, references a detailed analysis posted on Substack, and expresses mixed feelings about the prompt’s origin while stressing its practical usefulness. ## Full Transcript
0:00I have never done this before. I'm 0:02actually going to break down what I 0:03learned by studying the leaked alleged 0:07system prompt of Claude 4. Now, the 0:11reason I haven't covered this in the 0:12past is because I feel very ambiguous 0:16about the idea of leaking system 0:19prompts. It's a grey hat tactic at best. 0:22It happens all the time. I would say on 0:24average you have a system prompt leak 0:26within 48 hours of any major model 0:30release and it's part of what enables 0:33models to proliferate so rapidly. But 0:36regardless of how I feel about it, the 0:38reason I'm covering it today is because 0:41I think there is so much to learn from 0:44this system prompt and whether or not 0:45it's real. Because of course the model 0:47makers never ever validate whether 0:49they're real and I don't care if it is 0:51or not. I care about the prompt 0:53structure and I'll link to it in the in 0:55the description for this 0:57video. The the key to this prompt is 1:03changing from the idea that a prompt is 1:06about instructing a model to do 1:10something to the idea that a prompt is 1:13about building policies that prevent 1:15failure modes. And you might think, 1:18well, that's okay for a system prompt 1:20for a model. Maybe if you're using an 1:21API that makes 1:23sense. Why would I care as an operator 1:25that chats? I actually think you care a 1:28lot because at the end of the day, you 1:30also care about quality outputs. And 1:32most people put 80% of their effort into 1:36what the model should do for them and at 1:39best 20% of their effort into what they 1:42don't want the model to do. This prompt 1:45for Claude 4 is basically the opposite. 1:49It's like 90% what Claude should not do 1:52and 10% what it should do. And I want to 1:55go through seven different tactics that 1:58I found inside the prompt. And again, I 2:00will link to this. You can see the whole 2:02prompt if you want. It's like 300 some 2:03lines, 10,000 words. So number 2:08one, this prompt instantiates identity 2:11and the things that do not change 2:14upfront. So it starts with concrete 2:16facts. This is this model's identity. 2:18This is the current date. This is the 2:19core capabilities. The idea is that 2:23establishing context early that's steady 2:26and stable reduces working memory 2:28burden. It's not so much a hack, it's 2:31just it's good instructional 2:33design. Number two, triggers and 2:36template 2:37refusals. This is an explicit ifx then y 2:41conditional blocks that handles edge 2:43cases. And I think the care that this 2:46prompt puts into edge cases is a master 2:49class on its own. Ambiguity leads to 2:52inconsistencies from these models. If 2:54you want to have consistent behavior, 2:57you need to be clear and spell out your 2:59edge 3:01cases. And so this isn't so much about 3:03being restrictive, it's about being 3:05clear. It's about saying this use case 3:08has these boundaries and these explicit 3:12conditionals. And I think that is an 3:15incredibly powerful principle for 3:16prompting. Number three, uh, and I'm 3:20going to be very precise about this. I 3:21call it three tier uncertainty routing. 3:24So basically, this gives the model very 3:28specific instructions for how it handles 3:30ambiguity. And I've almost never seen 3:32this in another prompt. 3:35So the first answer, the first step in 3:38the decision tree is this is timeless 3:40information. Please answer it directly. 3:43Don't go, don't pass go, don't pay $100 3:46in the monopoly parlance. Just answer it 3:49right away. Number two, assess it as 3:52slow changing information. Answer 3:55directly plus offer to verify. And 3:58number three, assess it as live 4:00information. Maybe you're asking about 4:01today's stock prices. 4:03search 4:05immediately. The lesson here is that 4:07good prompts include decision criteria, 4:10not just commands. You need to help the 4:13model determine when, not just how. And 4:17this is especially true for agentic 4:20communication. Like if you are giving an 4:22agent a guiding policy, this kind of 4:25routing on uncertainty is 4:27critical. Number four, what I call lock 4:31tool grammar. So it provides both 4:35correct and incorrect 4:38examples when it instructs the model to 4:42use APIs in the cloud for system prompt. 4:45So we have valid function call formats 4:47and we have explicitly invalid function 4:50call formats. And I've called out before 4:52that you want to have counter examples, 4:53but this really underlines 4:55it. If you are using tools or functions, 4:59if you're using API calls, if you're 5:00using MCP servers, whatever it is, show 5:03both right and 5:05wrong. It's like teaching someone to 5:08ride a bike and also showing common ways 5:10people fall, like slowing down too much. 5:13Teaching my kids to ride bikes. I I get 5:15how this works. The point is that 5:17negative examples are powerful. They're 5:19powerful teaching tools for people. And 5:21it turns out they're powerful teaching 5:22tools for models as well, especially 5:24when you're trying to teach a model how 5:27to use a tool well. And we have almost 5:29no examples of how you prompt well that 5:32we have talked about publicly that do 5:34this. And so I think it's really 5:36important and this is an example of why 5:38I decided to talk about this prompt. 5:40There's a lot of gems in here we we 5:42should be grabbing. Number five of 5:45seven, binary style 5:48rules. Instead of subjective guidelines, 5:51the clawed four prompt is extremely 5:54prescriptive about things that it cares 5:57about. Hard onoff rules. Never start 6:00with flattery. Okay. Uh that is much 6:04more onoff hardcoded than the phrase be 6:09concise. You know, concise is 6:11interpretable by the model. Never start 6:14with flattery is a lot more 6:16binary. Models handle absolute 6:20rules. It it no bullets unless requested 6:23is much clearer. No emojis unless 6:25requested is much clearer to the model 6:26than minimize 6:29formatting. So you want to be in a place 6:34where your prompts are that clear. The 6:38takeaway here is that your prompts are 6:40not supposed to have wishy-washy 6:42adjectives. They're supposed to have 6:44extreme clarity on what you care about 6:47most. Number six, positional 6:49reinforcement. This is a really 6:51interesting technique we need to talk 6:52about. Throughout this lengthy prompt, 6:55critical instructions are repeated at 6:57strategic positions, not just at the 7:00beginning. Again, I rarely see this. So, 7:02the prompt repeats these constraints 7:05multiple 7:06times. It works because attention 7:08degrades over long contexts. If you 7:11reinforce every say 500 tokens, what are 7:15the critical rules? It's kind of like 7:17giving your your model a speed limit 7:20sign as it reads this lengthy prompt. 7:21This is what a 10,000word prompt. It's 7:23reading every 500 tokens. Oh yeah, I got 7:26to remember this. 7:28Um, and so what this looks like is you 7:30have your main instructions, you have 7:32the intervening content that continues 7:33to have prompt instructions, then you 7:36have, hey, sign, you know, signpost. Uh, 7:38remember never use customer PII in your 7:41examples. More content comes through. 7:43Critical reminder, all PII must be 7:45anonymized. Like you're repeating it and 7:47that acts to positionally reinforce this 7:50in the LLM's memory. I again, this is 7:53something that humans do as 7:55well. This is just good instructional 7:58design. It repeats and aids retention 8:01for humans and for models, but I rarely 8:03see us do it in prompts. Okay, last one. 8:06Number 8:08seven, posttool 8:11reflection. There's a built-in thinking 8:13pause after tool use. Again, this is an 8:15agent-based one. If you're using MCP 8:17servers or APIs, this is going to be 8:19relevant for you. The prompt has 8:22instructions to strongly consider 8:25outputting a thinking block after 8:27function results. 8:30Tool outputs are not always easy to 8:32parse and so a reflection step can 8:35improve accuracy. It can help you with 8:38figuring out what to do next. Especially 8:41for Claude 4, which is renowned for 8:43these long multi-step interled reasoning 8:46plus tool use chains. It literally looks 8:48like this. When you're using Claude 4, 8:51you want to take a minute to read what 8:54you're getting before you decide on your 8:55next move. 8:58And I think the takeaway there is that's 9:00probably a good principle for our 9:01prompting in general. Build in that 9:03cognitive checkpoint. Don't just make it 9:06spit the tool output. Ask it to process 9:09and think about 9:10it. Look, this can get turned into a 9:14prompt you can use as an operator too. 9:16You're not stuck with this and like 9:18admiring the clawed prompt from afar. 9:22Like you can actually use it yourself. 9:25I think the thing that I take away here 9:27is that 9:29we 9:31need more understanding of prompts as 9:35operating 9:36systems. Prompts are not just 9:39incantations. They're not spells. 9:41They're not magic words that makes the 9:43LLM do a thing. They're like an OS 9:46config file. It's it's about being 9:49extremely precise about what you 9:52intend. And it's okay to be defensive in 9:54that. It's okay to address 9:55hallucination, copyright issues, harmful 9:58content 9:59exhaustively rather than just kind of 10:01waving a hand at it and moving on. And 10:04if you do that, if you are more 10:06passionate and more caring about 10:08defensive programming than most of your 10:10peers, when you write these prompts, you 10:12are going to get better results and that 10:13will add up to real value. I would also 10:16say the third takeaway I have besides 10:19operating system defensive programming 10:21is you want to be 10:24declarative. Instead of first X do 10:27Y, see if you can frame it as a policy. 10:30If X always 10:32Y, how much of your prompting would 10:35change if you did that? So there you go. 10:38Long long dig into prompts. I have 10:40something on the Substack on this as 10:42well. Um, I think this was a fantastic 10:45use case. Again, I have some mixed 10:48feelings about how this alleged prompt 10:50came into being, but it's so useful. It 10:53is worth learning from regardless. And I 10:54hope you've been able to see that over 10:56the course of this conversation.