Learning Library

← Back to Library

Prompt Engineering: Contracts for Reliable LLMs

9m • Unknown Channel • ai-ml • tutorial • intermediate • Watch on YouTube ↗

Key Points

Prompt engineering once shone as a specialty for coaxing LLMs, but as models get better at understanding intent, the role has shifted toward ensuring reliable, predictable outputs.
Because LLMs generate tokens probabilistically, small changes in wording or parameters can produce wildly different results, which is acceptable in chat but problematic for software that expects exact formats.
Using an LLM to convert free‑form bug reports into strict JSON illustrates the risk: occasional deviations like extra text, renamed fields, or malformed JSON can break downstream systems.
Reliable LLM integration now requires a defined output contract, automated validation with retry loops, and observability (e.g., tracing prompts to responses) to guarantee consistency.
Frameworks such as LangChain and other prompt‑engineering tools help orchestrate these pipelines by structuring pre‑ and post‑model steps, enforcing contracts, and providing the necessary monitoring.

Sections

Full Transcript

# Prompt Engineering: Contracts for Reliable LLMs **Source:** [https://www.youtube.com/watch?v=cgVppD6paYo](https://www.youtube.com/watch?v=cgVppD6paYo) **Duration:** 00:09:57 ## Summary - Prompt engineering once shone as a specialty for coaxing LLMs, but as models get better at understanding intent, the role has shifted toward ensuring reliable, predictable outputs. - Because LLMs generate tokens probabilistically, small changes in wording or parameters can produce wildly different results, which is acceptable in chat but problematic for software that expects exact formats. - Using an LLM to convert free‑form bug reports into strict JSON illustrates the risk: occasional deviations like extra text, renamed fields, or malformed JSON can break downstream systems. - Reliable LLM integration now requires a defined output contract, automated validation with retry loops, and observability (e.g., tracing prompts to responses) to guarantee consistency. - Frameworks such as LangChain and other prompt‑engineering tools help orchestrate these pipelines by structuring pre‑ and post‑model steps, enforcing contracts, and providing the necessary monitoring. ## Sections - [00:00:00](https://www.youtube.com/watch?v=cgVppD6paYo&t=0s) **Prompt Engineering’s Unpredictable Reality** - As LLMs become smarter, their probabilistic token sampling makes even well‑crafted prompts—like those that force strict JSON bug‑report formatting—yield inconsistent outputs, turning the once‑glamorous prompt‑engineer role into a reliability challenge. - [00:03:17](https://www.youtube.com/watch?v=cgVppD6paYo&t=197s) **LangChain Triage-to-JSON Pipeline** - The speaker explains how LangChain’s composable “runnable” steps—prompt template, chat model, and validation—transform user bug text into a validated JSON response for a triage application. - [00:06:45](https://www.youtube.com/watch?v=cgVppD6paYo&t=405s) **Prompt Declaration Language Overview** - The speaker explains PDL as a YAML‑based declarative specification that defines the desired output shape, model calls, typing, and control structures for LLM workflows, which an interpreter executes top‑down to assemble context, enforce types, and produce results. ## Full Transcript

0:00Do you remember when rompt engineer was the hot new profession? Prompt engineers could 0:06whisper the right combination of magic words to a large language model to get them to do things 0:11that regular folks issuing regular prompts simply couldn't do. Well, as LLMs got smarter and better 0:19understanding intent, the title of prompt engineer has lost some of its shine. But the fact remains 0:27that the output of LLM is not predictable. LLMs don't behave like deterministic functions like 0:39most other things in computing. They're actually probabilistic. Each token is sampled from a 0:45distribution condition on everything that came before. Change the wording a little bit or add 0:50another example or change the temperature and well, you can end up with a different response. 0:55And in chat, maybe that's fine. In software, could be a bit of a bug factory. So, let me 1:01give you an example of what I mean. So, let's use an LLM to structure bug reports. I'll supply the 1:08bug report in free form text. And I want the LLM to return strict JSON with this shape. So, 1:18we've got a summary string here. We've got a severity string of either low or medium 1:24or high and then we have got a series of steps. Now if I use a chatbot interface or an API call 1:32to invoke an LLM, I can include some instructions my prompt. So it might be something like you are 1:37a triage assistant return JSON with this format. Here's the first bug report that goes to the LLM 1:46and well it might do it. In fact, most of the time probably will. But every now and again, 1:51an LLM might not quite follow the path. It might not return just the JSON at all. Or perhaps it 1:58wraps the the JSON in a friendly sentence like, "Sure, here's the reformatted report." Or maybe 2:04it drifts off schema, so it renames summary as synopsis. Well, when software is expecting precise 2:12JSON like this in a precise format and it gets all these variances, well, that's when things start to 2:18break. So, to somebody working to incorporate LLM output into software, prompt engineering actually 2:24means a few very specific things. One of those is contract, which is where the shape of the output 2:33is decided up front, like which keys and enums to use. It also means defining a control loop to 2:43validate every response against the contract and if it fails to automatically retry with tighter 2:49instructions or a constrained decode and it also means being observable. So observability. So for 2:58example tracing so you can see exactly which prompts produced what. So changes don't ship 3:04unless the numbers say they're safe. So today there are tools that can help with that form 3:10of prompt engineering and we're going to take a look at two PDL but first of all langchain. 3:17So langchain is an open- source framework for building LLM apps with a pipeline of composable 3:24steps. So you define what happens before and after a model call not just the words that you 3:30send to it. So let's use our triage to JSON example. So at the very top of this example, 3:38we need something to send to the model. So we are going to have our user bug text as our input and 3:46we're going to send that into the element called a prompt template that will receive the user bug 3:55text. Now in lang chain, each box here is a runnable. That's a step that takes some input, 4:03does something, and then outputs a result. So the prompt template runnable packages the instruction, 4:10the prompt like you're a triage assistant, output JSON only. Here's the shape of the JSON. It will 4:17combine that with the user bug text and the same template is reused each time so the wording stays 4:25consistent. Now that gets sent to a chat model, the actual LLM itself. So this is a chat model 4:37runnable and that will call an LLM and produce a response. So let's call this response the 4:46candidate JSON that we've received back from the LLM as a text string. Now next we go to another 4:55runnable. This one is called the validate runnable and that checks the candidate response against our 5:06schema checking if things like the keys are present and the steps in error a non-empty list 5:11and so forth. And if that passes validation then it ends up getting sent to our overall 5:22application. So this is the application that is going to receive all of this JSON that we actually 5:29want to send it to. So that's if it works. If it fails, we go down a different path where we go to 5:39uh another runnable called retry or repair. So, this is the fail path. Now, this is a runnable 5:49that can retry, send another model message with some firmware instructions or it can repair, 5:56it can make a small fix like strip out the extra pros that came with the JSON. Now, if that passes 6:04validation, then it's all good. It's off to the application we go. If that still doesn't work, 6:12then we take what's called a fallback path. And for example, there we might try a stricter model. 6:21And whether it passes first try or we need to do a retry and repair or we need to go through 6:29to the fallback eventually the app receives clean JSON and we keep the traces and the metrics around 6:36as well so we can spot regressions and improve over time. So that's langchain. What about PDL? 6:45That's Prompt Declaration Language. Now, PDL is a declarative spec for LLM workflows. And 6:52the core idea is that most LLM LLM interactions really about producing data. So, you declare the 7:00shape of that data and the steps to produce it. And you do that all within a single file, a YAML 7:09file to be precise. And then a PDL interpreter runs that file. It assembles context. It calls 7:16models and tools. It enforces types and it emits results. So with the PDL in this file here, 7:24we've got our prompt which is defined the thing that we're going to call. We've got the contract 7:31which is what we want this to actually produce. And then we've got the control loop and they all 7:38live in this one file, this YAML file. Now a bit more about the PDL spec itself. So the top level 7:50text is an ordered list where each item can be either a literal string or it can be a block that 8:01calls out to a model. PDL runs this list top down. So strings are appended to the running output and 8:10to a background check context. So when it hits a model block, the default input is everything 8:16that's produced so far unless you provide an explicit input. Now you can declare types in PDL 8:24as well. Those are types for input and output on model steps. So the interpreter does type 8:30checking and it fails on shape violations. Control is explicit. We have things like conditionals for 8:38control and we also have loops we can add in for control. And typical programs also use depths. 8:48Those are used to read data like reading data from a file or from standard in. And then there is a 8:55final data section to collect the name results you want to emit. tracing and live explorer let 9:03you inspect each block's inputs and outputs and the exact context that was sent to the model. 9:10So basically we have got langchain that we've talked about today and what langchain really 9:18is is it's something that can be considered code first. It's a code first pipeline with 9:26runnables and you wire those runnables together. PDL on the other hand, PDL is really spec first, 9:36everything lives in this YAML file where the prompt and the types and the control flow live 9:41together and are executed by an interpreter. And together, tools like these are really becoming 9:48the grownup toolbox that are turning all of this prompt whispering into real software engineering.