Learning Library

← Back to Library

When Single Prompt Fails: Agentic Workflows

Key Points

  • When a single prompt to even the largest LLM fails, the speaker switches to an agentic workflow that chains multiple LLM calls.
  • The example task involves checking a list of grocery items that were omitted from an order, verifying that each omission has a valid explanation, and flagging any missing or inadequate notes.
  • The biggest LLM could not reliably identify edge‑case explanations (e.g., vague reasons like “meh”), so a single‑prompt solution proved insufficient.
  • The agentic workflow decomposes the problem into sequential steps: extract items and explanations, evaluate the validity of each reason, compare the results to the original list, and generate a formatted text report.
  • This modular, multi‑prompt approach illustrates how breaking a complex task into smaller LLM operations can achieve results that a single prompt cannot.

Full Transcript

# When Single Prompt Fails: Agentic Workflows **Source:** [https://www.youtube.com/watch?v=bwvfdFWR1RI](https://www.youtube.com/watch?v=bwvfdFWR1RI) **Duration:** 00:06:19 ## Summary - When a single prompt to even the largest LLM fails, the speaker switches to an agentic workflow that chains multiple LLM calls. - The example task involves checking a list of grocery items that were omitted from an order, verifying that each omission has a valid explanation, and flagging any missing or inadequate notes. - The biggest LLM could not reliably identify edge‑case explanations (e.g., vague reasons like “meh”), so a single‑prompt solution proved insufficient. - The agentic workflow decomposes the problem into sequential steps: extract items and explanations, evaluate the validity of each reason, compare the results to the original list, and generate a formatted text report. - This modular, multi‑prompt approach illustrates how breaking a complex task into smaller LLM operations can achieve results that a single prompt cannot. ## Sections - [00:00:00](https://www.youtube.com/watch?v=bwvfdFWR1RI&t=0s) **From Single Prompt to Agents** - The speaker describes moving from scaling up language models to employing an agentic workflow after the largest LLM fails to solve a grocery‑order discrepancy task. - [00:03:15](https://www.youtube.com/watch?v=bwvfdFWR1RI&t=195s) **Multi‑Prompt Workflow for LLM Tasks** - The speaker describes breaking a complex operation into separate extraction, validation, comparison, and generation prompts—showing that using multiple focused prompts succeeds where a single, all‑in‑one prompt caused the LLM to fail. ## Full Transcript
0:00So, normally when I build a prompt using and using LLM, I usually start with a, a smaller 0:06LLM. And if I can't get that to work, I move to a slightly larger one. I can't get that to work, I 0:12move to a bigger one and a bigger one and a bigger one. But what happens if you've got, if 0:17you're using the biggest LLM you have available and you still can't get it to work? One 0:23option is to use an agentic workflow instead of a single prompt to a single 0:30LLM. This actually happened to me recently. I was working on a problem, and I was using the biggest, 0:35baddest LLM I had available to me, and I could not solve the problem with a single prompt. So what 0:42will, what we're going to do in this video is we're going to walk through that problem, and I'll 0:46show you how I shifted from a single LLM to an agentic workflow. So the 0:52problem was it seemed pretty straightforward at the beginning. I was given two pieces of 0:58information. The first piece of information was a list of items that were not included in the order. 1:04So imagine you've submitted a grocery request to a grocery store, and an employee goes through and 1:11kind of does your shopping for you. So the first piece of items was, the first piece of information 1:17were the items that were not included in the order. 1:26The second piece of information were notes from the employee, and the employee, if he's not able to 1:31find an item, he's supposed to mention that in the notes. 1:42So the request that was made of me was to look at the items in column B, the items that were not 1:49included in the order, and make sure that there was an explanation in column A. And if there 1:56was not an explanation, put that into a file like a text file that would look something like this. 2:08Cheese- No explanation. So I was able to get it to work for the most part, but I had, I had 2:15issues with edge cases. You know, for example, let's say that the explanation here for 2:21cheese is meh. Okay, that's really not a good reason not to get the cheese. 2:28And because the reasons had to be valid in this column, um, sometimes I had trouble telling 2:35which reads, I could identify the reasons, but I couldn't really tell if they were valid or not. So 2:40again, because I wasn't able to solve this problem with a single prompt to the largest LLM, I moved 2:46to an agentic workflow. Here's what my agentic workflow basically looked like. 3:00So I had the first prompt. I had one prompt that extracted items 3:09from B with an explanation. 3:18The second prompt took the information from the first prompt and 3:25determined whether or not those reasons were valid. The third prompt, and it probably didn't have to 3:31be a prompt, but you could use a prompt or an LLM to do it, is compare. 3:41P2 output to 3:48B, to column B. And then, the fourth would be just to 3:55output the text. So again, maybe you do this in three prompts. Maybe maybe do it in 4:02four. Maybe these actually aren't prompts in an LLM. There's some other function, text function. Um, 4:08that's really not the point here. The point is that I tried to do this with a single prompt, and 4:14I couldn't get it to work. This worked. And if you look at these prompts kind of individually, 4:19they're really doing different things. Like this first one, I would say it's more of an extraction 4:25function. It's extracting information from the text. This one is doing some, it's more of a 4:30classification. It's looking at the reasons and trying to determine whether or not that reason is 4:36valid or not. Um, the third one, you're just doing a comparison. So I guess that's kind of a 4:40classification. And then the output of text, that's more of a generation. So we were trying to do. It 4:45didn't seem that way at first, but because we were trying to do all these different types of prompts 4:51or these types of functions inside the prompt, I think the, the original LLM trying to do this in 4:56one big bite was getting confused. So let's take this and go back to our original example to see 5:02how it would work in the, in the, in the workflow. So P1 here, 5:09you know would look at this information right here and extract something like this. Ham: 5:17Old Cheese: May. 5:24The second prompt would look at this information right here and determine that, you 5:30know, meh is not a really a good reason not to fill an order. So it would be something like this. 5:39The third prompt would compare this to this, and we would end up with something that 5:46looked like this. Because again, cheese is the item in here without a valid explanation for not 5:52being filled. And then the fourth would basically be output 5:59this into the final format. So again, sometimes, the biggest baddest LLM is not going 6:06to do everything. And sometimes, you have to break the problem into several steps and use multiple 6:12LLMcalls, multiple prompts, or multiple functions to get to where you need to be. And doing that is 6:17called an agentic workflow.