When Single Prompt Fails: Agentic Workflows
Key Points
- When a single prompt to even the largest LLM fails, the speaker switches to an agentic workflow that chains multiple LLM calls.
- The example task involves checking a list of grocery items that were omitted from an order, verifying that each omission has a valid explanation, and flagging any missing or inadequate notes.
- The biggest LLM could not reliably identify edge‑case explanations (e.g., vague reasons like “meh”), so a single‑prompt solution proved insufficient.
- The agentic workflow decomposes the problem into sequential steps: extract items and explanations, evaluate the validity of each reason, compare the results to the original list, and generate a formatted text report.
- This modular, multi‑prompt approach illustrates how breaking a complex task into smaller LLM operations can achieve results that a single prompt cannot.
Sections
- From Single Prompt to Agents - The speaker describes moving from scaling up language models to employing an agentic workflow after the largest LLM fails to solve a grocery‑order discrepancy task.
- Multi‑Prompt Workflow for LLM Tasks - The speaker describes breaking a complex operation into separate extraction, validation, comparison, and generation prompts—showing that using multiple focused prompts succeeds where a single, all‑in‑one prompt caused the LLM to fail.
Full Transcript
# When Single Prompt Fails: Agentic Workflows **Source:** [https://www.youtube.com/watch?v=bwvfdFWR1RI](https://www.youtube.com/watch?v=bwvfdFWR1RI) **Duration:** 00:06:19 ## Summary - When a single prompt to even the largest LLM fails, the speaker switches to an agentic workflow that chains multiple LLM calls. - The example task involves checking a list of grocery items that were omitted from an order, verifying that each omission has a valid explanation, and flagging any missing or inadequate notes. - The biggest LLM could not reliably identify edge‑case explanations (e.g., vague reasons like “meh”), so a single‑prompt solution proved insufficient. - The agentic workflow decomposes the problem into sequential steps: extract items and explanations, evaluate the validity of each reason, compare the results to the original list, and generate a formatted text report. - This modular, multi‑prompt approach illustrates how breaking a complex task into smaller LLM operations can achieve results that a single prompt cannot. ## Sections - [00:00:00](https://www.youtube.com/watch?v=bwvfdFWR1RI&t=0s) **From Single Prompt to Agents** - The speaker describes moving from scaling up language models to employing an agentic workflow after the largest LLM fails to solve a grocery‑order discrepancy task. - [00:03:15](https://www.youtube.com/watch?v=bwvfdFWR1RI&t=195s) **Multi‑Prompt Workflow for LLM Tasks** - The speaker describes breaking a complex operation into separate extraction, validation, comparison, and generation prompts—showing that using multiple focused prompts succeeds where a single, all‑in‑one prompt caused the LLM to fail. ## Full Transcript
So, normally when I build a prompt using and using LLM, I usually start with a, a smaller
LLM. And if I can't get that to work, I move to a slightly larger one. I can't get that to work, I
move to a bigger one and a bigger one and a bigger one. But what happens if you've got, if
you're using the biggest LLM you have available and you still can't get it to work? One
option is to use an agentic workflow instead of a single prompt to a single
LLM. This actually happened to me recently. I was working on a problem, and I was using the biggest,
baddest LLM I had available to me, and I could not solve the problem with a single prompt. So what
will, what we're going to do in this video is we're going to walk through that problem, and I'll
show you how I shifted from a single LLM to an agentic workflow. So the
problem was it seemed pretty straightforward at the beginning. I was given two pieces of
information. The first piece of information was a list of items that were not included in the order.
So imagine you've submitted a grocery request to a grocery store, and an employee goes through and
kind of does your shopping for you. So the first piece of items was, the first piece of information
were the items that were not included in the order.
The second piece of information were notes from the employee, and the employee, if he's not able to
find an item, he's supposed to mention that in the notes.
So the request that was made of me was to look at the items in column B, the items that were not
included in the order, and make sure that there was an explanation in column A. And if there
was not an explanation, put that into a file like a text file that would look something like this.
Cheese- No explanation. So I was able to get it to work for the most part, but I had, I had
issues with edge cases. You know, for example, let's say that the explanation here for
cheese is meh. Okay, that's really not a good reason not to get the cheese.
And because the reasons had to be valid in this column, um, sometimes I had trouble telling
which reads, I could identify the reasons, but I couldn't really tell if they were valid or not. So
again, because I wasn't able to solve this problem with a single prompt to the largest LLM, I moved
to an agentic workflow. Here's what my agentic workflow basically looked like.
So I had the first prompt. I had one prompt that extracted items
from B with an explanation.
The second prompt took the information from the first prompt and
determined whether or not those reasons were valid. The third prompt, and it probably didn't have to
be a prompt, but you could use a prompt or an LLM to do it, is compare.
P2 output to
B, to column B. And then, the fourth would be just to
output the text. So again, maybe you do this in three prompts. Maybe maybe do it in
four. Maybe these actually aren't prompts in an LLM. There's some other function, text function. Um,
that's really not the point here. The point is that I tried to do this with a single prompt, and
I couldn't get it to work. This worked. And if you look at these prompts kind of individually,
they're really doing different things. Like this first one, I would say it's more of an extraction
function. It's extracting information from the text. This one is doing some, it's more of a
classification. It's looking at the reasons and trying to determine whether or not that reason is
valid or not. Um, the third one, you're just doing a comparison. So I guess that's kind of a
classification. And then the output of text, that's more of a generation. So we were trying to do. It
didn't seem that way at first, but because we were trying to do all these different types of prompts
or these types of functions inside the prompt, I think the, the original LLM trying to do this in
one big bite was getting confused. So let's take this and go back to our original example to see
how it would work in the, in the, in the workflow. So P1 here,
you know would look at this information right here and extract something like this. Ham:
Old Cheese: May.
The second prompt would look at this information right here and determine that, you
know, meh is not a really a good reason not to fill an order. So it would be something like this.
The third prompt would compare this to this, and we would end up with something that
looked like this. Because again, cheese is the item in here without a valid explanation for not
being filled. And then the fourth would basically be output
this into the final format. So again, sometimes, the biggest baddest LLM is not going
to do everything. And sometimes, you have to break the problem into several steps and use multiple
LLMcalls, multiple prompts, or multiple functions to get to where you need to be. And doing that is
called an agentic workflow.