Learning Library

← Back to Library

Model Selection: Focus on Tasks

Key Points

  • Instead of asking “which model should I use for my workflow,” focus on the specific atomic task you need to accomplish.
  • Tasks are the tiny “Lego bricks” within a workflow, and identifying them lets you match the right model to the right piece.
  • Honest assessment of data messiness, required steps, and output format is essential for achieving reliability, speed, and accuracy.
  • The explosion of available models and pricing structures in 2025 makes model selection increasingly complex, especially for broader, vague assignments.
  • While modern models can handle large, messy jobs, predictable, repeatable, high‑quality results still depend on breaking workflows into clear micro‑tasks and choosing models accordingly.

Full Transcript

# Model Selection: Focus on Tasks **Source:** [https://www.youtube.com/watch?v=-5zFZznthw0](https://www.youtube.com/watch?v=-5zFZznthw0) **Duration:** 00:10:10 ## Summary - Instead of asking “which model should I use for my workflow,” focus on the specific atomic task you need to accomplish. - Tasks are the tiny “Lego bricks” within a workflow, and identifying them lets you match the right model to the right piece. - Honest assessment of data messiness, required steps, and output format is essential for achieving reliability, speed, and accuracy. - The explosion of available models and pricing structures in 2025 makes model selection increasingly complex, especially for broader, vague assignments. - While modern models can handle large, messy jobs, predictable, repeatable, high‑quality results still depend on breaking workflows into clear micro‑tasks and choosing models accordingly. ## Sections - [00:00:00](https://www.youtube.com/watch?v=-5zFZznthw0&t=0s) **Choosing Models at Task Level** - Nate argues that instead of asking which model fits an entire workflow, you should break the workflow into atomic tasks and select models for each piece based on data messiness, steps, and output requirements. - [00:03:21](https://www.youtube.com/watch?v=-5zFZznthw0&t=201s) **Breaking Workflows into Atomic Tasks** - The speaker emphasizes that successful AI automation relies on dissecting a process into its smallest, irreducible tasks and assigning the most suitable LLM to each, instead of relying on a single model to handle an entire multi‑step workflow. - [00:07:14](https://www.youtube.com/watch?v=-5zFZznthw0&t=434s) **Exponential ROI from Multi‑Model AI** - The speaker explains that serious AI work demands investing in multiple specialized models, resulting in an exponential return on investment rather than a simple, linear increase. ## Full Transcript
0:00I get asked all the time, "But Nate, 0:02which model would you use for this or 0:03that workflow?" That's really the wrong 0:06question. And I want to spend this video 0:08talking to you about asking the right 0:09question so you can get farther on the 0:11AI work that you're doing. Don't ask 0:14which model should I use for my 0:15workflow. Instead, 0:17think about the atomic level of the 0:21task. Ask which model should be used for 0:23your task. And if you put up your hands 0:25at this point, you say, "Nate, I'm 0:26asking the same thing. A task is a 0:28workflow." No, it's not. Tasks are bits 0:32of workflow. They're like Lego bricks 0:33inside a workflow. And the reason I'm 0:35insisting on that level of detail is 0:37because if we're not that honest about 0:39the individual pieces inside our 0:40workflows, we're not going to be able to 0:42pick the right model for the job. If you 0:44want reliability and speed and accuracy 0:46and the right model for the task, you 0:48have to be honest about how messy your 0:49data is. You have to be honest about how 0:51many steps that the task requires, what 0:53the final output needs to look like. 0:55Most people just want to be told the 0:57answer. And that's why their automations 0:59fail. I'm sorry, there's not a shortcut. 1:01People keep asking me, "What will people 1:03have jobs?" This is an example of where 1:05we're going to have jobs, guys. We're 1:07going to have jobs because which model 1:09should I use is a really hard question. 1:11And I will be honest with you, it is 1:14getting harder in 2025. Not Do you know 1:16why? Because there's more and more 1:18models to choose from, more and more 1:19levels of intelligence, more and more 1:20unit economics to factor in. Even if 1:23you're a consumer and you don't care 1:24about cost per token, there's more 1:26consumer models to choose from. You can 1:27choose Kimmy K2 thinking or Claude or 1:29Chad GBT you Gemini Croc you name it. 1:32The real problem for most people is that 1:35they have difficulty getting to a level 1:38of clarity about what they plan to do 1:41with the work that they assign the LLM. 1:44The workflow is too big a unit of work. 1:46Most workflows consists of like a dozen 1:49different microtasks and people tend to 1:52want to assign the entire thing to the 1:55AI. Now, I have to give credit to the 1:57model makers here. They are doing their 1:59very level best to give us models that 2:02can take that level of vague assignment 2:04and make it work. They're working really 2:06hard on that. I saw huge progress with 2:09Claude Opus 4.5 in particular because it 2:14can take a big messy task like make a 2:17deck out of this mess and it will just 2:19work away until it gets done. Same thing 2:23with vibe coding in Opus 4.5. it just 2:25works away and knocks down bugs until it 2:27gets done. And so we are getting to the 2:30point where for some consumer 2:32applications, if you just hand the model 2:35a bunch of stuff, it will produce 2:36something at the end of a workflow, 2:38which is in and of itself kind of 2:40amazing. But if you want predictability, 2:43if you want repeatability, if you want 2:46high quality and high consistency, then 2:48what you need is to think in terms of 2:51the task. So I'm going to give you some 2:54example. These tasks are very very 2:56common. They they occur across multiple 2:58workflows. And the more you can see 3:01workflows as composed of literally Lego 3:04bricks because they're interchangeable 3:06pieces, they're tasks that we repeat 3:08over and over with different inputs, the 3:10better off you're going to be at finding 3:12the right model. Cleaning data, that's a 3:14great example of a Lego brick. Finding 3:16context, another great Lego brick. 3:18Inferring missing pieces from a pattern, 3:21that's a Lego brick. reasoning. That's a 3:23Lego brick. Transforming format from A 3:25to B, checking correctness, producing an 3:28artifact, handing it off to the next 3:30step and taking the the data and passing 3:32it along, making a plan to get something 3:35done. These are all individual tasks. 3:37And so when someone says, "My workflow, 3:39if we actually want to automate it, I 3:42ask myself, which of several LLMs do we 3:44need to get involved with here?" Because 3:46you might pick one LLM for cleaning the 3:48data and a different one for reasoning. 3:50You don't often need a very fancy model 3:52for cleaning data unless the data is 3:54really dirty. And this is why people 3:55keep trying to throw a single agent at a 3:5714step process and they wonder why it 3:59stalls. They wonder why it loops. They 4:00wonder why it hallucinates. A model is 4:03not going to magically fix a bad scoped 4:06unit of work. A model will not repair 4:09something and make it work if you didn't 4:11scope it correctly to begin with. And so 4:13I think that the unit has to be the 4:16task. If we want to win the workflow 4:19from an AI automation perspective, if we 4:22want AI automations that work, it starts 4:24with understanding our tasks. So ask 4:26yourself if you're doing a particular 4:28piece of work, and this by the way is 4:30not just for AI engineers. It's for 4:33anybody. What is the real sequence of 4:36irreducible atomic units of work here? 4:40What am I doing when I write a product 4:42requirements document? Well, I have to 4:45synthesize information from 50 different 4:48customer stories and then I have to 4:50study the current UI and extract an 4:54understanding of where the feature would 4:55go. And then I have to think of three 4:57different ways the feature could go 4:59based on the three different sort of 5:00insights I've gotten from the customer 5:02stories. And then I have to align that 5:05with the road map. And then you see what 5:07I mean, right? like these are all 5:09individual atomic units of work for just 5:11one flow around writing a PRD. And the 5:14trick is if you're picking models, I 5:16find that it's better to pick the model 5:20that goes with that unit of work. And so 5:22if we play that back again, I would use 5:24Gemini 3 right now to synthesize those 5:27customer stories I was talking about. 5:28It's especially good with synthesizing 5:30video. That sounds good. I would use 5:31Gemini with Nano Banana to study the UI 5:33and identify places and ways to put the 5:36new feature in. I would probably use 5:39chat GPT 5.1 in thinking mode or pro 5:42mode to think about the relationship 5:44between the road map and the proposed 5:46idea. I would probably use Opus 4.5 to 5:50construct the PRD document once all of 5:53those inputs are in place. And I could 5:56use other tools, right? Chat PRD exists 5:58for a reason. It's a great tool. You can 6:00use specialized tools in some of these 6:01instances. But if you want to get more 6:04fluent at AI, if you want to get more 6:07fluent at model selection, it starts 6:09with understanding the task. And so I 6:11broke out that PRD so you can see how 6:13I'm like taking apart the task and I'm 6:15picking a particular model for the task 6:17based on my fingertip feel for the 6:19models. And if you want to know how do I 6:21get to a fingertip feel for the models, 6:23the simple answer is it goes right back 6:25to the task. It goes back to giving the 6:27model a job. I know that I trust Gemini 6:303 with customer stories because I have 6:32tried Gemini 3 with Claude and I've 6:34tried it with Grock and I've tried it 6:35with Chad GPT and I've tried it with 6:36with Kimmy and I know that Gemini does a 6:39better job. It synthesizes in a way that 6:41allows me to read and understand it 6:42clearly reads the whole messy context 6:45and does a fine job with it. I have a 6:47fingertip feeling for Gemini in that 6:50particular area. And that comes from 6:51practice and it comes from deliberate 6:53exposure across models. And this gets 6:55back, by the way, to just a little tip 6:59for you when you think about budgeting 7:00for models and how much you're willing 7:02to pay. When people ask me, Nate, what 7:05is the model for this workflow? Behind 7:07that question, they are often asking me, 7:10Nate, where do I spend my 20 bucks? 7:12Nate, where do I spend the money that 7:14I'm choosing to invest in AI? And can I 7:16just pick one? And as much as I wish the 7:18answer was yes, I don't think the answer 7:20is yes. The answer is not pick this one 7:22model and it will just work for you. If 7:25you were doing casual work with AI, you 7:27can forget everything I just said 7:29because you can pick any model and pay 7:30your 20 bucks and it will just work for 7:32you. If you were doing serious work with 7:34AI, that answer does not work. It just 7:36won't because you need to have the 7:38specialty characteristics of AI to do 7:41the serious work. And I think 7:43increasingly if you look at return on 7:45investment for that whatever amount 7:47you're paying for AI, I see it and feel 7:50it and others who are in AI feel it too 7:52as an exponential return on investment 7:54curve. In other words, you get X return 7:57on 20 bucks a month for one model and 8:00you're happy. If you invest more and you 8:03know how to use it and push on it and 8:05you're doing what I'm describing where 8:06you're picking the task and you're 8:07pushing as hard as you can and you're 8:09paying for two models and you're and 8:10your budget is higher, it ends up being 8:12a hundred bucks a month in some cases. 8:13Maybe it goes as high as 300 bucks a 8:15month or whatever it is. You're paying 8:17more, right? It's a lot more. It feels 8:19like your return on investment is not 8:22linearly higher. It is exponentially 8:24higher. And this is what I wish I could 8:25convey. The reason why the the fancy 8:28plans work is because people who invest 8:31in the fancy plans get 8:33disproportionately more value. Like if 8:35you get 2x the value for investing in 8:38the 20 buck plan, you're going to get 8:4010x the value if you know how to use it 8:42for investing in the fancy plan because 8:44the limits are higher, because the 8:45intelligence access is better. And I'm 8:48not going to deny it. There's absolutely 8:49a correlation effect. The people who are 8:51willing to pay more typically are the 8:52people who know how to use the AI 8:54better. And that is another massive 8:55driver. So if you are asking me, Nate, 8:58which model do I choose? I'm going to 9:00come back and I'm going to say as much 9:01as you can, as much as you are willing 9:03to, if you want to lean in on AI 9:05fluency, think about it in terms of the 9:08task. And then think about how much 9:10model power you can apply to that 9:12particular task and which model 9:14specializes in that task. And if you 9:16want to know how to get there on a 9:17fingertipy feel, I've written up a lot 9:19on Substack about this and which model I 9:21pick. I even have a prompt on picking 9:22the right model for a task. 9:24uh between Gemini and Chad GPT and 9:26Claude. So there's resources I've got 9:28for you, but I don't want to hide the 9:31ball. You also need to practice. You 9:34need to touch the models a lot. You need 9:36to touch as many different models as you 9:38can and give them real work and compare 9:39the difference and use your honest to 9:41say this sucks. This doesn't suck. This 9:43sucks less. This is worth doing. And 9:45that's how you get to a sense very 9:47rapidly in the PRD example I described 9:49of which model you'd use for any given 9:51task. I hope this is helpful. I wish the 9:54answer was as easy as me recording 30 9:56seconds of video and saying always use 9:58chat GPT. It is not. That is just not 10:01truthful. And so I hope this honest 10:03answer is helpful if you're trying to 10:04figure out which which model to use and 10:06how to think about which model to use 10:08for