Learning Library

← Back to Library

Prompt Engineering: Contracts for Reliable LLMs

Key Points

  • Prompt engineering once shone as a specialty for coaxing LLMs, but as models get better at understanding intent, the role has shifted toward ensuring reliable, predictable outputs.
  • Because LLMs generate tokens probabilistically, small changes in wording or parameters can produce wildly different results, which is acceptable in chat but problematic for software that expects exact formats.
  • Using an LLM to convert free‑form bug reports into strict JSON illustrates the risk: occasional deviations like extra text, renamed fields, or malformed JSON can break downstream systems.
  • Reliable LLM integration now requires a defined output contract, automated validation with retry loops, and observability (e.g., tracing prompts to responses) to guarantee consistency.
  • Frameworks such as LangChain and other prompt‑engineering tools help orchestrate these pipelines by structuring pre‑ and post‑model steps, enforcing contracts, and providing the necessary monitoring.

Full Transcript

# Prompt Engineering: Contracts for Reliable LLMs **Source:** [https://www.youtube.com/watch?v=cgVppD6paYo](https://www.youtube.com/watch?v=cgVppD6paYo) **Duration:** 00:09:57 ## Summary - Prompt engineering once shone as a specialty for coaxing LLMs, but as models get better at understanding intent, the role has shifted toward ensuring reliable, predictable outputs. - Because LLMs generate tokens probabilistically, small changes in wording or parameters can produce wildly different results, which is acceptable in chat but problematic for software that expects exact formats. - Using an LLM to convert free‑form bug reports into strict JSON illustrates the risk: occasional deviations like extra text, renamed fields, or malformed JSON can break downstream systems. - Reliable LLM integration now requires a defined output contract, automated validation with retry loops, and observability (e.g., tracing prompts to responses) to guarantee consistency. - Frameworks such as LangChain and other prompt‑engineering tools help orchestrate these pipelines by structuring pre‑ and post‑model steps, enforcing contracts, and providing the necessary monitoring. ## Sections - [00:00:00](https://www.youtube.com/watch?v=cgVppD6paYo&t=0s) **Prompt Engineering’s Unpredictable Reality** - As LLMs become smarter, their probabilistic token sampling makes even well‑crafted prompts—like those that force strict JSON bug‑report formatting—yield inconsistent outputs, turning the once‑glamorous prompt‑engineer role into a reliability challenge. - [00:03:17](https://www.youtube.com/watch?v=cgVppD6paYo&t=197s) **LangChain Triage-to-JSON Pipeline** - The speaker explains how LangChain’s composable “runnable” steps—prompt template, chat model, and validation—transform user bug text into a validated JSON response for a triage application. - [00:06:45](https://www.youtube.com/watch?v=cgVppD6paYo&t=405s) **Prompt Declaration Language Overview** - The speaker explains PDL as a YAML‑based declarative specification that defines the desired output shape, model calls, typing, and control structures for LLM workflows, which an interpreter executes top‑down to assemble context, enforce types, and produce results. ## Full Transcript
0:00Do you remember when rompt engineer was the  hot new profession? Prompt engineers could 0:06whisper the right combination of magic words to  a large language model to get them to do things 0:11that regular folks issuing regular prompts simply  couldn't do. Well, as LLMs got smarter and better 0:19understanding intent, the title of prompt engineer  has lost some of its shine. But the fact remains 0:27that the output of LLM is not predictable. LLMs  don't behave like deterministic functions like 0:39most other things in computing. They're actually  probabilistic. Each token is sampled from a 0:45distribution condition on everything that came  before. Change the wording a little bit or add 0:50another example or change the temperature and  well, you can end up with a different response. 0:55And in chat, maybe that's fine. In software,  could be a bit of a bug factory. So, let me 1:01give you an example of what I mean. So, let's use  an LLM to structure bug reports. I'll supply the 1:08bug report in free form text. And I want the  LLM to return strict JSON with this shape. So, 1:18we've got a summary string here. We've got  a severity string of either low or medium 1:24or high and then we have got a series of steps.  Now if I use a chatbot interface or an API call 1:32to invoke an LLM, I can include some instructions  my prompt. So it might be something like you are 1:37a triage assistant return JSON with this format.  Here's the first bug report that goes to the LLM 1:46and well it might do it. In fact, most of the  time probably will. But every now and again, 1:51an LLM might not quite follow the path. It might  not return just the JSON at all. Or perhaps it 1:58wraps the the JSON in a friendly sentence like,  "Sure, here's the reformatted report." Or maybe 2:04it drifts off schema, so it renames summary as  synopsis. Well, when software is expecting precise 2:12JSON like this in a precise format and it gets all  these variances, well, that's when things start to 2:18break. So, to somebody working to incorporate LLM  output into software, prompt engineering actually 2:24means a few very specific things. One of those is  contract, which is where the shape of the output 2:33is decided up front, like which keys and enums  to use. It also means defining a control loop to 2:43validate every response against the contract and  if it fails to automatically retry with tighter 2:49instructions or a constrained decode and it also  means being observable. So observability. So for 2:58example tracing so you can see exactly which  prompts produced what. So changes don't ship 3:04unless the numbers say they're safe. So today  there are tools that can help with that form 3:10of prompt engineering and we're going to take  a look at two PDL but first of all langchain. 3:17So langchain is an open- source framework for  building LLM apps with a pipeline of composable 3:24steps. So you define what happens before and  after a model call not just the words that you 3:30send to it. So let's use our triage to JSON  example. So at the very top of this example, 3:38we need something to send to the model. So we are  going to have our user bug text as our input and 3:46we're going to send that into the element called  a prompt template that will receive the user bug 3:55text. Now in lang chain, each box here is a  runnable. That's a step that takes some input, 4:03does something, and then outputs a result. So the  prompt template runnable packages the instruction, 4:10the prompt like you're a triage assistant, output  JSON only. Here's the shape of the JSON. It will 4:17combine that with the user bug text and the same  template is reused each time so the wording stays 4:25consistent. Now that gets sent to a chat model,  the actual LLM itself. So this is a chat model 4:37runnable and that will call an LLM and produce  a response. So let's call this response the 4:46candidate JSON that we've received back from the  LLM as a text string. Now next we go to another 4:55runnable. This one is called the validate runnable  and that checks the candidate response against our 5:06schema checking if things like the keys are  present and the steps in error a non-empty list 5:11and so forth. And if that passes validation  then it ends up getting sent to our overall 5:22application. So this is the application that is  going to receive all of this JSON that we actually 5:29want to send it to. So that's if it works. If it  fails, we go down a different path where we go to 5:39uh another runnable called retry or repair. So,  this is the fail path. Now, this is a runnable 5:49that can retry, send another model message with  some firmware instructions or it can repair, 5:56it can make a small fix like strip out the extra  pros that came with the JSON. Now, if that passes 6:04validation, then it's all good. It's off to the  application we go. If that still doesn't work, 6:12then we take what's called a fallback path. And  for example, there we might try a stricter model. 6:21And whether it passes first try or we need to  do a retry and repair or we need to go through 6:29to the fallback eventually the app receives clean  JSON and we keep the traces and the metrics around 6:36as well so we can spot regressions and improve  over time. So that's langchain. What about PDL? 6:45That's Prompt Declaration Language. Now, PDL  is a declarative spec for LLM workflows. And 6:52the core idea is that most LLM LLM interactions  really about producing data. So, you declare the 7:00shape of that data and the steps to produce it.  And you do that all within a single file, a YAML 7:09file to be precise. And then a PDL interpreter  runs that file. It assembles context. It calls 7:16models and tools. It enforces types and it emits  results. So with the PDL in this file here, 7:24we've got our prompt which is defined the thing  that we're going to call. We've got the contract 7:31which is what we want this to actually produce.  And then we've got the control loop and they all 7:38live in this one file, this YAML file. Now a bit  more about the PDL spec itself. So the top level 7:50text is an ordered list where each item can be  either a literal string or it can be a block that 8:01calls out to a model. PDL runs this list top down.  So strings are appended to the running output and 8:10to a background check context. So when it hits  a model block, the default input is everything 8:16that's produced so far unless you provide an  explicit input. Now you can declare types in PDL 8:24as well. Those are types for input and output  on model steps. So the interpreter does type 8:30checking and it fails on shape violations. Control  is explicit. We have things like conditionals for 8:38control and we also have loops we can add in for  control. And typical programs also use depths. 8:48Those are used to read data like reading data from  a file or from standard in. And then there is a 8:55final data section to collect the name results  you want to emit. tracing and live explorer let 9:03you inspect each block's inputs and outputs and  the exact context that was sent to the model. 9:10So basically we have got langchain that we've  talked about today and what langchain really 9:18is is it's something that can be considered  code first. It's a code first pipeline with 9:26runnables and you wire those runnables together.  PDL on the other hand, PDL is really spec first, 9:36everything lives in this YAML file where the  prompt and the types and the control flow live 9:41together and are executed by an interpreter. And  together, tools like these are really becoming 9:48the grownup toolbox that are turning all of this  prompt whispering into real software engineering.