Learning Library

← Back to Library

Deep Research: AI’s Hot New Feature

Key Points

  • The episode welcomes three experts: Kate Soule on KV cache management, Volkmar Uhlig on indices and vector databases, and Shobhit Varshney on quantum computing’s intersection with AI.
  • A rapid rollout of “deep research” features across major AI platforms (Google Gemini, ChatGPT, Perplexity, Grok) is highlighted as the current competitive focal point.
  • This surge is traced back to breakthroughs like DeepSeek’s R1 model, which showcased advanced reasoning and spurred rivals to launch comparable deep‑research capabilities.
  • The show will also cover additional hot topics such as rumors about OpenAI’s upcoming inference chip, the emergence of small vision models, and a new job listing for an AI agent.
  • The discussion frames “deep research” as the latest AI differentiator that companies are racing to implement to demonstrate superior reasoning power.

Sections

Full Transcript

# Deep Research: AI’s Hot New Feature **Source:** [https://www.youtube.com/watch?v=y4gm_4UFT28](https://www.youtube.com/watch?v=y4gm_4UFT28) **Duration:** 00:45:44 ## Summary - The episode welcomes three experts: Kate Soule on KV cache management, Volkmar Uhlig on indices and vector databases, and Shobhit Varshney on quantum computing’s intersection with AI. - A rapid rollout of “deep research” features across major AI platforms (Google Gemini, ChatGPT, Perplexity, Grok) is highlighted as the current competitive focal point. - This surge is traced back to breakthroughs like DeepSeek’s R1 model, which showcased advanced reasoning and spurred rivals to launch comparable deep‑research capabilities. - The show will also cover additional hot topics such as rumors about OpenAI’s upcoming inference chip, the emergence of small vision models, and a new job listing for an AI agent. - The discussion frames “deep research” as the latest AI differentiator that companies are racing to implement to demonstrate superior reasoning power. ## Sections - [00:00:00](https://www.youtube.com/watch?v=y4gm_4UFT28&t=0s) **AI Deep Research Feature Survey** - The hosts introduce guests and discuss how major AI services—Google Gemini, ChatGPT, Perplexity, and Grok—are rolling out similarly named “deep research” capabilities. - [00:03:13](https://www.youtube.com/watch?v=y4gm_4UFT28&t=193s) **AI Research Agent Planning Process** - The passage explains how AI tools like Google and ChatGPT first create a research plan, clarify ambiguous queries, and then crawl and extract relevant web content to emulate a human analyst’s workflow. - [00:06:16](https://www.youtube.com/watch?v=y4gm_4UFT28&t=376s) **Enterprise Layers and Deep Research Skepticism** - The participants debate whether tools like Deep Research truly innovate beyond existing search and multi‑step reasoning components, emphasizing the need for enterprise‑grade solutions and questioning the technology’s real technical merit. - [00:09:20](https://www.youtube.com/watch?v=y4gm_4UFT28&t=560s) **Challenges in AI Research Evaluation** - The speaker outlines how hard it is to measure and benchmark deep research deliverables from AI firms—lacking clear validation metrics unlike code or math—while noting the growing competition among giants such as Google, OpenAI, Perplexity, and newcomer Grok. - [00:12:23](https://www.youtube.com/watch?v=y4gm_4UFT28&t=743s) **OpenAI's Push for Inference Chips** - The speaker explains why OpenAI is prioritizing custom inference hardware, its partnership with Broadcom, and how inference hardware needs differ from those for model training. - [00:15:26](https://www.youtube.com/watch?v=y4gm_4UFT28&t=926s) **Controlling Chip Supply for AI** - The speaker explains that owning data centers and co‑designing custom chips through exclusive manufacturer partnerships can drastically cut AI costs and may pivot sales pitches toward emphasizing hardware‑driven performance advantages. - [00:18:31](https://www.youtube.com/watch?v=y4gm_4UFT28&t=1111s) **Hardware‑Optimized Inference at Scale** - The speaker explains how shifting to specialized inference stacks (e.g., Amazon, Nvidia, OpenAI) reduces costs and makes high‑volume AI applications viable, while noting potential risks of hardware‑dependent models for open‑source ecosystems. - [00:21:37](https://www.youtube.com/watch?v=y4gm_4UFT28&t=1297s) **Scaling Distributed AI Inference** - Panelists discuss how fiber‑linked data centers are delivering billion‑user, transformer‑based AI inference at scale, reflect on the evolving definition of AI from deep learning to large language models, and note the rising competition in vision model development. - [00:24:42](https://www.youtube.com/watch?v=y4gm_4UFT28&t=1482s) **Granite Vision Model for Document AI** - The speaker describes Granite’s optimized image understanding for PDFs, dashboards, and screenshots, its enterprise document‑centric use cases and multimodal RAG plans, and notes upcoming releases while comparing it to competing VLMs such as Qwen and Pixtral. - [00:27:46](https://www.youtube.com/watch?v=y4gm_4UFT28&t=1666s) **Edge AI Enables Secure High‑Volume Processing** - The speaker explains that newer, smaller AI models can run on edge devices to provide advanced semantic understanding of images and video—allowing secure, on‑premise analysis for defense, manufacturing, and massive document‑processing workloads while cutting cloud‑streaming costs. - [00:30:48](https://www.youtube.com/watch?v=y4gm_4UFT28&t=1848s) **Dynamic Edge-Cloud Processing** - The speaker describes a system where devices use lightweight edge models to gauge question complexity, keeping simple tasks on‑device while offloading harder, bandwidth‑intensive video processing to the cloud, foreseeing specialized edge models for industrial use cases. - [00:33:53](https://www.youtube.com/watch?v=y4gm_4UFT28&t=2033s) **AI Agents as Job Candidates** - The speakers discuss a tongue‑in‑cheek job posting that invited only AI agents, debating whether such listings signal a near‑future where companies regularly hire autonomous agents to perform tasks traditionally done by humans. - [00:37:15](https://www.youtube.com/watch?v=y4gm_4UFT28&t=2235s) **Golden Thread: Ask‑to‑Task Automation** - The speaker argues that the next competitive edge for enterprises lies in building a unified “ask‑to‑task” pipeline that translates user queries into orchestrated LLM‑driven actions, enabling higher‑level planner agents to automate end‑to‑end workflows. - [00:40:16](https://www.youtube.com/watch?v=y4gm_4UFT28&t=2416s) **AI Compute Marketplace Concept** - The speaker proposes a platform that treats specialized AI model outputs as micro‑tasks, routing work through a centralized queue and using agent interfaces to deliver results from proprietary data sources without exposing the raw data. - [00:43:24](https://www.youtube.com/watch?v=y4gm_4UFT28&t=2604s) **Formalizing Agent Communication Protocols** - The speaker argues for replacing ad‑hoc natural‑language interactions with well‑defined API contracts and software‑engineering practices to ensure reliable, auditable, and efficient large‑scale agent deployments. ## Full Transcript
0:00What was the last thing that you had to do deep research on? 0:03Kate Soule is Director of Technical Product Management for Granite. 0:06Uh, Kate, welcome back to the show. 0:07What have you been researching? 0:08I've been researching KV cache management. 0:11Volkmar Uhlig is Vice President, AI Infrastructure Portfolio Lead. 0:15Volkmar, welcome back to the show. 0:16What have you been looking into? 0:17Indices and vector databases. 0:19And last but not least is Shobhit Varshney, Senior Partner Consulting on 0:21AI for US, Canada, and Latin America. 0:24Uh, Shobhit, welcome to the show. 0:25What have you been looking into? 0:26Quantum computing, especially how it intersects with AI. 0:29All right, all that and more on today's Mixture of Experts. 0:37I'm Tim Hwang, and welcome to Mixture of Experts. 0:39Each week, MoE distills the biggest stories in the world 0:42of artificial intelligence and gets you what you need to know. 0:45As always, we have a lot to cover. 0:46We're going to talk a little bit about rumors on OpenAI's inference chip. 0:50We're going to talk about small vision models. 0:51We're going to talk about a job listing for an AI agent, but first, I really 0:55want to talk about Deep Research. 0:58Um, and it's kind of a funny phrase to use because, uh, it 1:01seems like nowadays, everybody has a feature called "deep research." 1:04Um, Google Gemini has a deep research function. 1:07ChatGPT announced the deep research feature. 1:10Even Perplexity announced a deep research feature and not to be left 1:13out, Grok has also launched a feature as of late called DeepSearch. 1:18And these are all kind of features where you can do a query to a model 1:22and get back what is effectively a very long research report that's kind 1:25of in depth um, and you know, this all really literally happened I think in 1:30the last month two months or so. Um, I guess, Kate, maybe I'll start with you. 1:34Why is everybody suddenly launching a deep research feature? 1:37Um, what are they trying to do and why is it suddenly all very competitive? 1:40Why is, why is deep research the new hot thing? 1:42Yeah. 1:42So I think it's helpful to understand some of the broader context and when 1:46all of these features were released. 1:48So, you know, back in January, DeepSeek came out with their R1 model demonstrating 1:53crazy reasoning capabilities, uh, OpenAI, you know, maybe in a bit of a response 1:59to a way to show that they're, you know, also innovating on reasoning and doing 2:03a lot of work in the space pretty much launched as a fast follow from what I've 2:08been able to tell their deep research capability, which leverages the 2:12o3 reasoning model behind the scenes. 2:14And I think ever since that model came out, we've been seeing a lot of, 2:17uh, following the market and a lot of other companies come out and yes, 2:23and and create their version to just to try and follow that broader trend 2:27and focus on reasoning models that have really taken the world by storm. 2:30Yeah, for sure. 2:31And Shobhit maybe I'll bring you in here. 2:32You know, one of the questions I have is like, how do you win in this competition? 2:36You know, it's like, it's like we're suddenly living in a world where there's 2:39like four or five search engines. 2:40And it's like, you know, it's the early two thousands again. 2:43And it's kind of really question about like, if you see see any kind 2:45of differentiation between all the companies and how they're trying 2:48to win on this particular feature. 2:50Um, so I think Google came up with this first in December, followed by OpenAI 2:54and then followed by, uh, Perplexity and now, uh, with, uh, with Grok-3. 2:58Uh, the overall intention is, given a complex ask, I want you to go research 3:04this across a whole multitude of different websites and then try to 3:07cluster them in different topics and then go, when you find something, 3:10you go find other things that are relevant, just the way humans do this. 3:14So you're trying to replicate the way a human would have otherwise open up 20 3:17different browsers and try to simulate all of that into a topic and research, right? 3:22Now when companies like Google are doing this, they have a really good 3:25understanding of how web pages are structured and how things semantically 3:28connected and so on and so forth. 3:30So most of these deep research models will start first by creating a plan. 3:35They understand your query and they'll have a plan generated. 3:38In case of Google, I can go and hit edit and I can go change the plan if I need to. 3:41In case of ChatGPT, it'll go ask some follow up questions. 3:44So you can go modify that and understand that here's how we're going to execute. 3:48There are certain queries that need a little bit of disambiguation. 3:51Do you mean X versus Y? 3:53But I'd look for "Transformer" is that the movie or is that the 3:55model, things of that nature. 3:57In certain cases, you may need to narrow the field a little bit and go very, 4:00very deep in a particular topic, right? 4:02So you're first establishing 4:03here's the goals and here's the research plan. 4:06That's what a good research analyst would have done for you. 4:08Then it fires off and starts to go crawl the web and starts to find all the 4:12websites that are relevant, then extracts everything out of it and say, Hey, 4:16I found that this particular website was talking about a new, for example, 4:20we were looking at quantum last night. 4:21Microsoft released some new, a new model with some new matter altogether. 4:27So now all of a sudden you have a new topic that's coming up that I 4:29did not specify in my initial search. 4:31So that'll then spawn additional queries and so forth. 4:34So it's going and crawling all of that, bringing all the content back. 4:37So your question was around how do you win in this? 4:41On the more B2C side, me as an end customer, the person who can connect 4:47the dots between the websites and the content the best is likely going to win. 4:52Speed matters to an extent, whether it takes three minutes or four minutes. 4:56At the end of the day, I'm going to have to come back to this tab anyway. 4:58So I think I'm okay on the speed and the latency part. 5:01But understanding, grounding it, and being able to give the right citations. 5:05At least in my personal experience using Perplexity, 5:08I've been using Pro for a while. 5:09Perplexity Pro Deep Research was a little more hallucinating than I'm getting 5:14with Gemini and so far OpenAI has been the best for at least my experience 5:18on the topics that I researched. 5:20But there'll be some nuances of how you ground your content, you 5:24get all the responses back and then do visual interpretation 5:28of all those different websites. 5:30In the enterprise space... This is untapped right now. 5:33In an enterprise, when somebody gets a question saying: Why was my claim denied? 5:38Or if I want to say: Can I travel to X? 5:40And what will be my ATM charge? 5:42And what kind of benefits I have? 5:44There's a very unique set of documents that need to be looked 5:47at where a human researcher goes and looks at multiple systems. 5:51It's a combination of actually logging into a third party SaaS, figuring 5:55out information there, reading some documents, so on and so forth. 5:58We have not quite crossed the chasm yet on deep research 6:01coming to the enterprise space. 6:03I don't see a single vendor out there that's enabling us to go 6:06add enterprise data to it, be able to model how the reasoning steps 6:09work based on the topic at hand. 6:12I think 2025 towards the end of end, we'll start to see 6:14these models getting more open. 6:16There should be other layers on top that bring it into the enterprise space. 6:19I think that the company that's going to make billions of dollars. 6:22Versus somebody who's doing a B2C, uh, view. 6:25Yeah, I mean, I think that's kind of the interesting thing, is like, 6:27I feel like the use cases that I've seen online so far have been people 6:30who have pretty niche needs, right? 6:32Like, I think it's like, you know, researchers or kind of bloggers 6:36that need, you know, studies. 6:37Um, Volkmar, maybe I'll bring you in. 6:39I feel like over the last few episodes, you've increasingly become like the 6:42loud skeptic on the MoE expert panel. 6:45Um, are you kind of impressed by stuff like Deep Research? 6:48Do you use it at all? 6:48Like, it does feel to me that there's a bunch of problems. 6:51And I guess the question is like, also whether or not it's like 6:53technically that impressive either. 6:55It's like kind of a just combination of existing components. 6:57Is that the right way of thinking about it? 6:59I think it's a, it's a really interesting approach. 7:01I think it's incremental. 7:03Um, you know, we already had the ability to search. 7:06Now what we are doing is, you know, we are just extending 7:09the search, you know, scope. 7:11Uh, I think we already had the first iteration of, you 7:14know, go out, make a plan. 7:17Um, now I think what, and, you know, like, uh, do longer reasoning. 7:21You stayed out of the model. 7:23I think what, what's, changing now is that we are saying, you know, 7:26we go multi step reasoning and multi step document retrieval and 7:31extending, you know, the knowledge. 7:32I think the larger, uh, context window sizes allow it to do that. 7:37So that's what is one of the things which, you know, if you just have a 7:414k context window, you cannot do that. 7:43If you now have, you know, 128k, you can throw lots and lots of documents at it 7:47and you can start reasoning about it. 7:49So I think we are at this point. 7:51junction that, um, the aided data is available. 7:56So, I mean, this is also the other thing, right? 7:57So, OpenAI started having, you know, a call of the Internet 8:01accessible in a vector database. 8:03So, you needed search capabilities, you needed the long context window sizes, and 8:07you need to have the multi step reasoning. 8:09And so, all these things now are at a point where they are individually stable. 8:14And now we are getting into, okay, what can I build out of this? 8:18So, I think it's a really interesting uh, application and it, I think, shows the 8:22direction where we are heading, right? 8:25It's like multi minute, you know, processing with answers. 8:29Um, I think it also shows that we are at a point that the models, we are willing 8:35to not just babysit the models every, you know, a hundred characters, uh, 8:40but we are letting it run for a while. 8:42And, and, you know, the. 8:43The quality of the models is high enough that they don't 8:45just go off into a tangent. 8:47Totally. 8:47Yeah, it does feel like that's kind of like almost like the biggest thing is like 8:50less a technical thing and more just like a sociological thing is that like we just 8:53now have enough trust in these systems that we're willing to let them run like 8:56this, which is like pretty interesting. 8:58So Tim, on one of the challenges you see in deep research is you 9:02don't have a verifiable output to compare accuracy against. 9:06And we struggle with this even in organizations. 9:08So when you come back with a deep document on, say I'm looking at what 9:11are the, uh, milk regulations in Europe versus India versus the U. S., right? 9:16I don't know what good looks like, so it's difficult for you to verify the output. 9:20And a lot of these companies are struggling with the evaluations around 9:23these deep research, uh, files, right? 9:25There's some, uh, things that I can calculate, like how many, 9:28how many paths did you create? 9:29How long do you think? 9:30How many websites did you hit? 9:32And so forth. 9:33But there's not a good measure even in our real world, if I hire two different 9:37research companies to go research a particular topic, they will come back 9:41with different documents, but I won't have a good validation routine around that. 9:45So I think it's an order of magnitude tougher problem than say you are 9:48trying to write code or do some math where I can deterministically tell you 9:51whether the answer is correct or not. 9:53Yeah, no, I think it's like very evals question on this is like very hard. 9:57It's like how do you do good benchmarks on this kind of feature 10:00becomes like very quick, tricky. 10:01Yeah, maybe a final question. 10:02Um, Kate, maybe I'll turn to you is, um, you know, the four companies we have 10:07here, we have Google and OpenAI, right? 10:09Giants, you know, titans of the space. 10:11We have Perplexity, who has really spent a lot of time working on search. 10:14So it's no surprise that they would do this. 10:16What's kind of interesting here is, is Grok, which really has only hit the 10:19scene fairly recently, um, and is as yet kind of launching these features 10:24that are very much kind of at parity. 10:26And I don't know how you read that. 10:28I mean, you know, there's one point of view, which is, well, the space 10:30is more competitive than ever. 10:31Anyone can just kind of get in and launch these cutting edge features. 10:35I think there's also a view, which is, well, you know, Grok is just 10:37executing in an incredible way, but curious how you read that. 10:40It's like, is it easier and easier to launch some of these state 10:42of the art features, um, in, in with teams that are way smaller? 10:45I, it's one of the questions I have. 10:47We're benefiting from having so much of the innovation starting 10:51to be put into the open source. 10:53which is allowing, you know, a rising tide to, to float all boats. 10:56It's allowing less traditional players to enter the market. 11:00Uh, and we're seeing just, you know, a really rich ecosystem emerge from it. 11:03So it's exciting to see what, uh, you know, Grok and others can come out with. 11:09And as we talk about, you know, Deep Search and how that relates and Deep 11:13Research, you know, again, I really think right now, deep research is one of the 11:19more practical use cases for reasoning. 11:21Uh, if we're all innovating on reasoning and we're seeing a lot 11:24of that work in the open source, a lot of the benchmarks are on math. 11:27Like, I don't know that that is the killer use case that, you know, is why 11:32I pay for a bunch of reasoning tokens. 11:33But research is certainly an area where we're seeing some benefit. 11:37And, you know, I think this is just one of those. 11:40early use cases that we've identified where there's some clear 11:42demonstrable value that the reasoning is bringing. 11:46And so that's why we're seeing as new models come out, they're 11:49also coming out in parallel with a deep research type of capability. 11:59Well, I want to move us on to our next topic. 12:01Um, It's a story that I feel like we do every few episodes to be totally honest 12:05with you, which is every few weeks or months there's always rumors that OpenAI 12:10is working on its own chip. And the story this time was kind of a leak that you know 12:16OpenAI was kind of readying sort of an inference design with TSMC, which is kind 12:20of one of the lead, um, kind of chip fabs. 12:23And, uh, I think I wanted to kind of use it as a hook to talk a 12:26little bit and check in, I think, on sort of the state of like OpenAI's 12:30competition in the hardware space. 12:32And, you know, Volkmar, I guess you're the natural person to 12:34turn to for this sort of thing. 12:36You know, it's sort of interesting to me that at least what has been 12:38reported in the news is that OpenAI is investing first in inference chips. 12:43And I guess for our listeners, do you want to explain just like why this 12:46would be such a big priority for them? 12:47Because this is a very big bet they want to make. 12:49Um, and I guess the question is like what you think they, what you 12:52believe their upside to be, uh, in investing in this sort of thing versus 12:56using, you know, the established, uh, companies that are out there. 12:59OpenAI is building the chip not by themselves, right? 13:03They are partnering with Broadcom and Broadcom is one of the 13:07giants in chip manufacturing. 13:09So that's a, that's expected, I mean, they had to pick a partner if they 13:13don't want to become a chip company. 13:14And I don't feel that OpenAI, you know, wants to, wants to get into that market 13:20as a, as a primary business model. 13:22Now, if you look at, uh, training versus inferencing, the 13:28requirements are very different. 13:30Um, so in training, you know, if you, if you build a training cluster, 13:35it's a lot about, I mean, you have the basic GPUs, but then a good 13:39chunk of the money goes into the networking infrastructure and goes into 13:43storage system and having, you know, effectively a high performance computing 13:48system. 13:48So it's like very, like if you look at all the HPC people went, 13:52they all went into, into AI now and building these training cost us. 13:56So that, that's a, a critical category of, you know, system design. 14:00Then you go into inferencing and that is usually a much smaller problem. 14:06Now we have very large models and they don't fit on a single GPU. 14:09Uh, but often, you know, you're on like maybe eight. 14:13Kind of on the maximum. 14:15If you have a really, really large model, which is not necessarily what 14:18you're, you know, using to do inferencing to an end customer, but maybe, you 14:23know, for model verification for yourself, then you may go to 16 GPUs. 14:26So let's say two boxes, but you're not going much beyond that. 14:30And so, um, if you now look from a consumption perspective, you know, 14:34training this, we talked a lot about this huge training machines, but in the 14:39end, when you're scaling it to consumers, like the ratios of consumer hardware or 14:46consumer consumption capacity you need to put down is orders of, or at least 14:51an order of magnitude larger, right? 14:53And at the very beginning, everything, all the investment went into, 14:56uh, into training because, you know, we need to make the model. 14:59Now we have the model, now we want to use it. 15:01And now the growth actually inferencing side. 15:04So it's a natural conclusion. 15:06Um, from an OpenAI perspective to control your destiny. 15:10Now, the easiest one is, you know, you read the, uh, the 15:13profit statements of NVIDIA and there are around 69, 68, 69%. 15:19And so if you want to get eight, yeah, they're doing well and 15:21it shows in the stock price. 15:23So if you want to take a larger chunk of that revenue. 15:27Um, and of the profits, then, you know, you partner with a chip 15:30manufacturer, you get an exclusive deal. 15:32I'm sure, you know, like, NVIDIA and OpenAI, they have very specific 15:37deals where OpenAI probably pays less than the rest of the world. 15:40But still, it's, if you can control your supply chain for the product you're 15:44building, further, one step further down. 15:47You know, or two steps for that. 15:48And the first step is like, okay, I own my data center. 15:50The second step is I, I control the chip. 15:53Uh, now you can actually do chip manufacturing and you can design 15:56a chip for your protocol model. 15:57So you can co design a model for the chip and that's where you can probably 16:01get another three, four acts in, in cost reduction and I think OpenAI now 16:06at that scale they're operating at are doing the natural thing of any company, 16:10but just, you know, control your cost. 16:12Sure, but you want to talk a little bit about how this might impact kind 16:14of like the market for AI services, because it strikes me that in the 16:16past, you know, the way we've sold AI is that we go to customers and we say, 16:21look at this brand new shiny model. 16:23Look at all the things that it can do, um, work with us. 16:26Um, and presumably part of the pitch that OpenAI has here in the future will 16:30be well, it's also running on our chips. 16:32And as a result, things are way fast, like faster, or like way more performant. 16:37And kind of curious if you think that's going to shape sort of the 16:39sales pitch in this space, like sort of moving from selling the underlying 16:43infrastructure to like being the kind of primary focus versus the model per 16:48se, which I think we're seeing is just becoming more and more open source. 16:51Yeah, absolutely. 16:52And, uh, TSMC for the people who don't know is the hundred pound gorilla. 16:56Like they have like 65 percent plus of 16:58100,000 pound gorillas, you know, 17:00and the world comes to a grinding halt if something happens to TSMC and we're 17:05talking about chips, my, my Tesla door handle has two sensors, right? 17:09Imagine thousands of sensors across the entire car. 17:11All of those are being, uh, coming in from TSMC outside of, 17:14uh, like, outside of Samsung. 17:16I don't think there's any other brand that, that comes in more 17:19than 10% of the marketplace. 17:20So TSMC is super critical. 17:22Everybody's designing the chips, but TSMC is the heart of the 17:25entire industry at this point. 17:26Now, uh, if you look at, uh, Amazon, that's a good, uh, a good analogy. 17:32Amazon has its own inference chips. 17:34They have built their own NOAA models that are super optimized for their own chips. 17:38So that combination, Anthropic is going to be using a lot 17:41of the Amazon chips as well. 17:42So when you optimize the architecture of the hardware to work with the architecture 17:47of the software itself, that does magic. 17:49The total cost and the throughput that you can get, the latency 17:53decreases, the cost of delivering that comes down significantly. 17:56So in the enterprise world, Uh, when we, when you go do these 17:59large projects with clients, you're looking at high volume use cases. 18:03For example, if I take a Llama model, the Llama 3 model could 18:07be running on, on Azure, AWS. 18:09It could be running on NVIDIA. 18:11It could be running on, on Watson. 18:13When you look at, uh, the inference stack, if you take that Llama model and you make 18:18a NIMS out of it and you put it on NVIDIA directly, that is particular now for that 18:22particular model, they have NIMIFIED it. 18:24It's going to run at 5x more throughput. 18:26You may pay 5 percent more extra cost, but now you have 5x more throughput. 18:31So that, uh, that brings up use cases where you're doing some 18:35inferences at massive scale. 18:37So as an example, if you're doing, say, fraud detection and you're looking 18:40at invoice coming in or you have an image that you're looking at and 18:43you want to do that today at scale. 18:45We've been using classical computing techniques for doing those today. 18:49Because the volume is very high. 18:51You need a very quick latency. 18:53You need to have do this at millions of times every day. 18:56So the cost would add up really very quickly. 18:59So this whole shift towards Amazon inference stack with their own 19:02models and Nvidia with NIMS on top. 19:04This is the trend that OpenAI is following as well. 19:07So that high volume use cases, the cost of doing this would go down. 19:10and be very effectively running in production at scale. 19:14So from an enterprise perspective, the use cases don't change. 19:18But now we start to go after the high volume ones where 19:21earlier the ROI didn't exist. 19:22So I'm generally very excited about people looking at inference and optimizing 19:27it so I can take more AI to my clients and and infuse that into more processes 19:33at scale and deliver higher ROI. 19:34Yeah, for sure. 19:36Kate, do you think, um, one thing that kind of occurs to me is... Does, do 19:39these structural changes create any, like, dangers, I think, for open source? 19:43So kind of what I'm thinking a little bit about is in a world where you've 19:46got models, but they run way better if you're using a particular kind of 19:50hardware, or they, they can only run on a particular kind of hardware. 19:53It actually changes kind of the dynamics of open source, where I think the 19:57dream of open source is you can take your model, you can run it everywhere. 19:59And that builds the largest possible open source community around our models. 20:03Um, do you worry at all about kind of this hardware fragmentation as like 20:06things get more sort of specialized and kind of like really optimized 20:09for particular families of models? 20:11I'm not sure that I worry so much. 20:13I mean, the model around open source has always been there are open source 20:17versions of technology and then there are optimized enterprise supported 20:21versions that, you know, end up being what gets deployed, right? 20:25And so we always need to have a bit of that. 20:27Balance, um, as a whole. 20:30So, you know, it's not something that immediately keeps me up at night, but I do 20:34think there's another really interesting thing that's going on here that, you know, 20:3880 percent of the reason why, you know, anybody is doing this is probably exactly 20:42why, you know, Volkmar mentioned in terms of controlling costs, but I think there's 20:47uh, interesting part that also reflects that how these models 20:51are trained are is changing. 20:53And we're seeing a much larger emphasis on techniques like reinforcement 20:57learning, which require a huge amount of inference of really big models. 21:02And so being able to control your inference costs no longer is just 21:05being able to serve your models at a lower cost point to customers 21:08and run larger and longer jobs. 21:11It also is now a critical part of training so much so that you could easily start to 21:16see Reinforcement learning costs starting to outweigh the cost of pre training. 21:20Yeah, that's wild to consider. 21:21I haven't thought about that. 21:22And I think, uh, Tim, this is in line with what Google has been doing forever, right? 21:26Their tensor processing units, the TPUs are just so well designed. 21:31They are, they do an amazing job at doing this distributed across multiple centers. 21:35They don't need to build one big cluster. 21:37They're able to do this in a distributed center and connect them with very 21:40high fiber optic cables to do. 21:42Inferencing at scale. 21:44They have multiple products that are billion users plus every day, right? 21:49They've been deploying these AI models, deep learning models, 21:51transformer based models at scale at an insane pace across the world. 21:56So you'll see more and more of this. 21:58Inference time, optimized models, they're delivering great 22:01ROI and the right cost point. 22:03I'm very excited about this space. 22:05Yeah, for sure. 22:06Yeah, I was also going to comment, I feel like MoE is one of the few 22:08podcasts you can go on where a panelist literally does the chef kiss for GPUs. 22:18I'm going to move us on to our next topic. 22:20Um, you know, there's a joke that I always used to make when I was kind 22:23of, um, working at Google where we'd present, you know, oh, this is AI. 22:27And when we say AI, there's lots of different techniques, but really what 22:30we're talking about is deep learning. 22:31Right. 22:32And this was, you know, a decade plus ago now. 22:34Um, and I kind of feel like we've actually done a very similar thing now where we 22:37say, oh, well, when we say AI, we mean 22:40large language models. 22:41Um, but it actually just turns out there's like lots and lots 22:43of things happening in AI. 22:44And I think one of the most interesting things that's been 22:47popping off lately is kind of competition over vision models, right? 22:50Which I think, you know, have gotten short shrift, even though there's lots 22:53of exciting things happening there, but just because the LLMs have kind 22:56of taken up so much of the space. 22:58Um, and Kate, ideal for you to be on the show for this episode, because 23:01I understand Granite is out with a number of new small vision models. 23:05And so first, do you want to kind of walk us through that and what's been launched? 23:08And then I kind of more generally want to talk about 23:10like how this space is evolving. 23:12Absolutely. 23:13So a couple of things, first, uh, VLM, a vision language model, 23:18it's a little bit different than what folks might be familiar with. 23:21If they played with some of the earlier like stable diffusion models 23:24and a lot of the image generation models that we've seen to date. 23:28A VLM, which are these smaller models that are starting to get more popular, 23:32is all about image understanding. 23:35So it's a image and a prompt that gets sent as an input. 23:39Text is normally then returned as an output versus, you know, the, some of 23:44the original really popular DAL-E and other models that, um, you start with 23:48a prompt and you end up with an image. 23:50And the way these models work are you take a standard large language model 23:55often that's already trained to do language tasks and you do some additional 23:59training, uh, and to add a component on top that allows for an image to be, you 24:05know, basically expressed as an embedding that gets fed to your language model in 24:10addition to the embedding from the prompt and that information is together used 24:13from that language model to return the, the response. 24:16Um, so these are really, uh, becoming popular. 24:19We just saw a bunch of models drop. 24:21Uh, Granite released two weeks ago, our vision preview. 24:25Uh, and the full model is coming, uh, next week. 24:29So keep an eye out from the IBM Granite Hugging Face page. 24:33Uh, our model is only 2 billion parameters. 24:35It's really small. 24:36You can run it locally. 24:37And what we're really excited about is we've taken a very specific approach 24:40focusing on document understanding tasks. 24:42So think of 24:43images from the perspective of a chart in a PDF or a poorly scanned PDF 24:49document, um, or a GUI or a dashboard where you like take a screenshot 24:54and put that into the chat box and start asking questions about it. 24:57So, you know, uh, Granite can do all sorts of general vision 25:00understanding tasks, but we've really 25:01optimised the performance around this document understanding, uh, thinking 25:05through from our, you know, our enterprise customer perspective, that's going to be 25:08where there's a lot of really valuable use cases, particularly as we look at some of 25:13our other projects that are going on in this space, like dockling, uh, and more 25:17broadly looking at use cases around areas like multimodal RAG. So, yeah, so Granite, 25:22uh, preview released two weeks ago, the full version is coming out next week. 25:26We just saw Qwen, uh, released, I think, earlier today or early last night. 25:30Their, uh, family of VLMs ranging from 3 billion to, I think, 25:3572 billion parameters in size. 25:37And there's just a lot of other, uh, work going on in the space. 25:39You know, Pixtral, for example, is a common one that's been 25:42out for a little while. 25:44And we expect to see this type of capability only grow. 25:46Shobhit, do you want to give a little bit of a picture of kind of how the 25:49competition around this is evolving? 25:50I think, again, it's kind of like, you know, almost like a little 25:52bit like deep research, right? 25:54Which is, well, we've got this kind of interesting use case, and now 25:56people are trying to figure out where in the market it really belongs. 26:00For VLMs, it also seems similar, right, which is like suddenly you have 26:02this class of small vision models. 26:04What are enterprise people wanting to use it for? 26:06Absolutely. 26:07We've been working on vision models with clients for a while now. 26:10Earlier, this was a lot of the heavy lifting used to be 26:14done on a server on the cloud. 26:16So for example, if I take a Gemini 26:181.5 Pro model, it just chews through a whole video and can understand 26:21exactly what happened and has a really good understanding of 26:24what's, uh, what's going on. 26:25Those are very big, large models. 26:28There are a lot of use cases that we've been delivering for clients. 26:30As an example, large consumer goods company, distribution company, you have 26:35things around planograms where you walk into a store and you want to make sure 26:38the shelf has everything the right way. 26:40There are consumer goods companies where the label behind a particular 26:45product has to be, uh, relatively compliant to each region, right? 26:49If it's food versus, uh, some dresses and stuff. 26:52Then there are certain use cases around, uh, describing what's in the catalog. 26:56So for example, a large electronics manufacturer or a clothing apparel 27:00company or retailer, they would take images of what people are trying to sell. 27:04So when you upload a product, you want to describe that product. 27:07If you look at a big furniture store, when you take a piece of furniture, you need 27:11to create a lot of metadata so that it shows up in the research and stuff, right? 27:14Usually all of those tasks were very human driven. 27:17Now we're at a point where the images, as Kate was saying, they have 27:21evolved quite a bit, the VLMs, and they have a better understanding. 27:24Earlier they were able to just identify what's in this particular image. 27:28I could do some, some correlations and say this looks like a 27:32cat, this looks like a dog. 27:33Now it has evolved quite a bit. 27:35So for example, one of my clients, we have a camera that points at all the 27:41counters and you can see and tell which counter is more busy because it's also 27:44doing a people counting on the fly, right? 27:46So it understands which product is getting more popular. 27:49It has a better understanding of temporal. 27:51If I give you a few screenshots or video, it understands what's 27:55happening in the video, right? 27:56So from frame 2 to frame 19, what changed delta? 28:00So it's trying to understand that even better. 28:02So OCR was the first wave of use cases that we found now we're getting into 28:06more and more semantic understanding of what's happening in the overall 28:09picture that starts unlocking even more use cases and to Kate's point, the 28:13models are getting much, much smaller now. That allows us to do two things. 28:18One, I can now run these on a device 28:21while the person is running around. 28:23So the person in the field in a manufacturing facility can take 28:26a picture of something can have a small camera that's running 28:28and things are running on device. 28:30This was supremely important for security. 28:33A lot of these use cases, 28:34you don't want the images to being streamed out for security reasons. 28:38You want to run these things near on prem. 28:40We're looking at defence use cases, drones running around in territories 28:43where you don't have control over the cellular network and stuff like that. 28:47All of those required us to do smaller models on the edge. 28:50The second category of things it's unlocking for us is high volume use cases. 28:54So, for example, the document processing that Kate mentioned, those are being 28:58done millions of times every day. 29:01The incremental cost difference between a 7 billion parameter model 29:05and a 15 or or 30 billion, billion parameter makes a difference to 29:09the end ROI of that use case. 29:11So we are now coming to a point where these small models deployed, either at 29:15scale or on device, are delivering the ROI that's so critical for us. 29:19Yeah, that's great. 29:19Volkmar, I'm gonna give you an impossible question for this segment, 29:22uh, to close out this segment. 29:24You know, to what Shobhit just said, right, there's these very 29:26interesting kind of pressures. 29:28And I don't think there is a clear answer just yet as to like how much of AI 29:32workloads will happen at the edge versus like, you know, in big data centers. 29:36But it does feel like kind of like the prominence of smaller models and the 29:39fact that they're actually like perfectly performant for most industrial tasks, 29:43means that we have a world where this is going to be more and more on the edge. 29:46But I don't know if you think like the trend is really going to be sort of 29:4950/50 when this all kind of settles out, it's going to be all mostly on 29:52the edge, or just kind of curious about how you size up like where you 29:55think the models will live ultimately. 29:59So I think it will be a bit of everything from a bandwidth perspective, right? 30:04It's very cheap to transmit a couple of words. 30:07It's a very, very low bandwidth channel. 30:10The moment you go into vision, that's a high bandwidth channel. 30:13So you have a bandwidth issue. 30:15And so this is a traditionally, if you look at computation, it always 30:18has been the straight off between, you know, do I bring the computation 30:22to where the data is, or do I bring the data where the computation is? 30:26And so I think with, with text. 30:28Um, maybe ignoring latency, it was kind of in favor of bringing it to a data 30:33center so it can do the consolidation. 30:35It's like the argument of should I have a nuclear power plant or should I 30:38have a generator in my backyard, right? 30:40And so I think we have the same, and, and, you know, nuclear power plant in 30:44the end because it gets consolidation is more efficient, but now I need to 30:47have a power distribution network. 30:49And so I think we are in a similar situation here where, um, if I have a 30:54high bandwidth stream, uh, and I can actually solve this with a relatively 30:59small model at the edge that, you know, the economics work in that favor. 31:04And if you look at the trend, um, uh, like, you know, we, we are, we 31:09are now making decisions on, on based on the complexity of the question. 31:13Now for videos, that's really hard. 31:15Uh, but if you look at what, what the iPhone did and, you know, and 31:18we'll see this probably in all the phone manufacturers, like. 31:21You know, you have a model router at the beginning and the model 31:23router decides, this is an easy question or a complex question. 31:26If it's an easy question, I stay on device. 31:27If it's complex, I offload it into the cloud. 31:30And so I think from an, um, now flipping this, the cheapest input device, 31:37um, is effectively a camera, right? 31:39If you think about it, you capture, you know, 30 frames a second, uh, and you 31:44have millions of data points, right? 31:46And so now the, the, the entropy of millions of data points over time is 31:50very low, but you can capture, you know, a lot of information in one shot. 31:55Um, and, um, so the second one is audio. 31:57Would you want to transmit audio? 31:59That's probably more feasible. 32:01I think video is pushing it really to, to a point where you, 32:04you know, you will just process on the edge. 32:07And then I think what will happen is that the models will specialize. 32:10So, you know, what Shobit said, you know, you have these industrial use cases. 32:14Um, and if you're on a manufacturing plant, you may want to keep it as a 32:18record, but there's no reason for you to move it, you know, into a different data 32:22center on the cloud because you already have industrial scale installations, you 32:25already have data centers, et cetera. 32:27And so it makes much more sense to just do it, you know, locally. 32:31Now locally does not necessarily mean inside of the camera that may be required, 32:36you know, if you have a battery or so, but it could be inside of a building 32:41and, you know, may run a cable, which is a couple of hundred feet long. 32:44Yeah, that's right. 32:45Yeah, it's a good reminder that like where the edges is totally 32:47dependent on where you are. 32:48So... 32:50We kind of have this natural, you know, tendency because everybody 32:54carries an iPhone around. 32:55That it's the phone, right? 32:57And so, you know, it needs to be a package of a battery, a camera, and a processor, 33:01but that's not necessarily true. 33:02Um, maybe just a final question. 33:03I guess, Kate, if folks want to learn more about Granite's work, um, you know, where 33:07they should they go to get this new model. 33:08I know you said that there's a big announcement and there'll be a 33:10release next week, but, uh, anywhere. 33:12You know, online people should be paying attention to or anything like that. 33:14Yeah, I mean, we always post everything on our Hugging Face page, uh, under 33:19the IBM Granite org and then, uh, encourage folks to check out 33:23ibm.com/granite and you'll be able to find all the latest there. 33:32Um, I'm going to move us on to our last story of the day, really kind of 33:35more of a publicity stunt, if I can say that up front more than anything else. 33:40But I think it is kind of an interesting set of questions. 33:43The story is that Firecrawl, a Y Combinator startup, so very, very 33:47early company, got a little bit of attention on social media because 33:51they put out a job description. 33:53Um, and, uh, the job description was basically looking for someone 33:56who could assist in their open web, web crawler business. 33:59Um, but you know, the, the kind of listing specifically said, 34:03you know, humans need not apply. 34:04This is only for AI agents. 34:06Um, and you know, on interview, the founders of the company admitted, right? 34:10Like this is just kind of conceptual. 34:11It's kind of a funny experiment. 34:13You know, this is sort of a publicity stunt more than anything else, but 34:16it did get me sort of thinking about. 34:18you know, how far agents are going to go, particularly in the next year or so. 34:21Um, and whether or not for certain types of tasks, we really will start 34:25seeing kind of call for agents, right? 34:27To basically say, well, I could hire a human to do this job, or I've got an 34:31open call if anyone wants to produce an agent that will do the same job. 34:34Um, and so maybe Kate, I'll throw it to you first is like, 34:38are we living in that world? 34:39Are agents getting good enough, fast enough that, you know, 34:41we're going to start to see 34:43in 2025, 2026, some jobs really have listings for agents specifically. 34:50Well, look, I think what they did was clearly a bit 34:52of marketing tongue in cheek. 34:54Uh, but I think it's very realistic that to have a near future where we 34:59have catalogs of agents and people can also create, you know, specs 35:03for agents that types of behaviors that they want so others can build 35:06it and sell those agents, right? 35:08Um, I think that's very much where we're headed. 35:11I don't know that, uh, it's going to be a total job replacement, so to speak. 35:17I see a lot of opportunity for agents 35:20augmenting human roles and jobs. 35:22And I think that's much more realistic of, as we look at like, what will a 35:25job description look like next year? 35:28Having expertise and familiarity and, you know, part of the job description 35:31is helping manage agents and work with AI systems is I think, going 35:35to be increasingly a huge part of the, the new workforce, so to speak. 35:39Yeah, I think that'll be sort of an interesting bit of it. 35:41It reminds me a little bit of back in the day where it was like, you know, skills, 35:45you know, Microsoft Office Suite, Excel, Word, you know, whether or not kind of 35:48agents will be or experience with agents will be kind of like a relevant skill. 35:52Yeah, so I think this is not new. 35:54So if you look at what Dharmesh did from HubSpot CEO, he launched agent.ai last year. 35:59It's the largest network where you can create just like you would have 36:02gone to Fiverr to go hire people. 36:04You can go to agent.ai and find a catalog people are rating and you can hire that particular 36:09agent for a particular task and you pay by by different metrics, right? 36:12So I don't think this is new, having access to a variety of different agents 36:16who specialize in a particular domain. 36:19The way enterprises look at multi agent workflows, we spent the last 5, 10 years 36:25looking at structured, directional flows. 36:28We would go into an organization and say, let me find the workflows that are 36:32yucky and I'll reverse engineer them. 36:33We'll all create a new way of doing it and we'll codify it in a fixed flow. 36:38This was the RPA era. 36:39We got to only 10, 15 percent of the, of the flows that were 36:43deemed worthy of automation. 36:45The challenge there was when a human starts to trigger a flow, you go five, 36:49six steps in and you realize that, oh, there was a, there was something that 36:52went wrong and now you have to take over and now you start from scratch. 36:55So the human expert would just rather go into the whole step 10 steps 37:00by themselves so they have more control of what's going on so we could never go 37:03beyond 10-15 percent of the workflows that were automated using RPA robotic 37:08process automation. In this agent world, we have an opportunity for not having 37:14to define every deterministic step. 37:16So within very thin guidelines and guardrails. 37:19LLM can choose to figure out which LL- which API to call or which tool to, to 37:23choose, and how to pass in the parameters and create some sort of a plan, iterate 37:28through it, uh, and then make sure that we are heading towards the right direction. 37:30So very narrow task. 37:32Those will get automated with LLM agents fairly rapidly. 37:36You will see this from the native companies like Salesforce will have 37:39its own agent force, small pieces that are automated, but then you'll 37:42have external third party tools like Azure, uh, has its own copilots. 37:47Watson Orchestrate and others that'll sit outside and they'll orchestrate work 37:51across a gamut of these different agents. 37:54The technology is solved, is maturing pretty rapidly. 37:58The thing that is missing in the enterprises is ask to task. 38:02I should probably trademark this, but humans are incredibly good at this. 38:06We go from an overall ask to a series of tasks in our head really well. 38:11As soon as I get a question about, hey, why is my bill higher than last month? 38:14In my head, I'll trigger a few different tasks. 38:17Today, as a human, I'm doing them in sequence. 38:19Tomorrow, as LLM agents, I can just trigger these off on our own. 38:23The companies who can create a golden thread of ask to tasks are 38:27the ones who will win in this space. 38:29The agents themselves that are automating a small step, those 38:32will get commoditized really well. 38:33Once you have this golden record of ask to tasks, you can then create 38:37a planner agent that does that automatically for you, and that 38:40unlocks the multi agent workflows then. 38:43In order to get to multi agent, you have to solve for this ask to task. 38:46And the smaller LLM agents themselves, they will become fairly 38:49commoditized and you'll be able to go to a network like agent. 38:51ai or and go find agents that are doing that small task really, really well. 38:57Yeah, there's kind of a fun question here about sort of like effectively, 39:00like what's the paradigm on which agents will get integrated into organizations? 39:05Like I think one of the reasons why, you know, a job interview or a job listing 39:08for an agent is sort of silly is that. 39:10You know, we have B2B SaaS. 39:11If like we want to use an agent, we just like, you know, open up an 39:14account and click, click, click, and we've integrated it into our system. 39:17I think the only kind of weird world that is opened up by some of the multi agent 39:22stuff, um, there's this Google paper on AI scientists, uh, that came out yesterday. 39:27We're essentially the paradigm was almost like we're going to have an agent. 39:29That's the scientist. 39:30And then we're going to have some agents that run the experiment. 39:33And like, basically the, the kind of model for integrating agents was to 39:36basically create like a one to one analogy with like a laboratory and 39:41then have the, have the agents put in. 39:44Um, and that's the kind of world where you might want to hire, but 39:46I guess Kate and I guess, well, why you haven't spoken yet, but like, 39:49uh, we're not headed to that world. 39:51It just makes me cringe like this isn't preschool for agents, you know, 39:55like... There's got to be a better way 39:58Yeah, I think the the apis for that form of orchestration that is open, right? 40:04I mean we we went through centralized software to you know, SAS services, 40:10um, and you can just, you know, invoke an API, I think that is open. 40:15How that would work. 40:16I want to give it a different angle. 40:18We have something similar right now, which is Mechanical Turk at AWS, right? 40:24So I'm, I'm having micro tasks and I'm giving them out 40:28and then someone process it. 40:30So there's also an economic model of, do I have compute capacity available? 40:34And I'm not selling you a GPU, but I'm selling you. 40:38you know the work product and I may go to a centralized place picking 40:42up work items because I just have spare capacity or I have a model 40:47which is specialized in a particular way which produces better results. 40:50So right now you know this is more like a work queue management thing at a meta 40:55layer like you know not not saying hey I'm, you know, produce me a bunch of 41:00tokens or llama, but, you know, solve an actual problem for me and post a result. 41:05And so I think this is where it could go. 41:07And the other one is APIs, you know, like with the, with the baby agents, um, 41:11you could orchestrate something, but for example, you may not have data access. 41:15So someone may, you know, so let's say, you know, I'm going 41:17to run a chemistry experiment. 41:19I may not have all the data which is required to run the chemistry experiment. 41:24So I could imagine that, you know, I go to a company which actually sits on the data 41:28store, which it doesn't want to share, but it's happy to share the results of 41:31research or the summarisation of stuff. 41:34And then you may want to talk to an agent instead of talking to an API. 41:38So it's just the fact we're loving it one, one up. 41:40So your interface to that data set, maybe, maybe the large language model. 41:44So Tim, one of the things I would like to highlight just from a really hands 41:47on keyboards perspective, when we're deploying these large multi, multi agent 41:52networks for our clients, and we've had done quite a few of these in the last 41:54six months. A large pharma company, we're doing some content creation, 41:59authoring for compliance reporting, and there's another agent that will 42:02come audit it, so on and so forth. 42:04There's another healthcare client where we are working on a customer facing member 42:07multi agent where you can understand all other nuances and secondary intents 42:11that get triggered and come back. 42:13There's a telco where we're creating some software development, there's a 42:16BPO process where we're doing some 42:18three, four way matching quite a few of these examples where we have multi agent 42:22frameworks in the last five, six months that we've put in production for clients. 42:26One of the challenges we're running in, uh, is how do you describe 42:30the guidelines to these agents? 42:33We have as a society somehow figured out that English is the right 42:37way to talk to these LLM agents. 42:38Which I don't think will scale in enterprises when you get to agents, 42:42you're trying to go look at a complex workflow and saying that, hey, if 42:46you have a question about the status of the ticket, go use this tool and 42:49you're giving it a few short learning. 42:51So the actual context that we give to these LLM agents becomes 2, 3 pages for 42:55a small task because we have to add all these bandages and if then statements 42:59that essentially you're codifying in English and that's just one small agent. 43:03When you start to get to the planner, this is completely breaks down. 43:06You cannot possibly give a 30 page context to a LLM. 43:11The latency is very high, there will be all kinds of overlapping 43:13rules, things of that nature. 43:15So I think as a community, we will need to make some progress, and I think IBM 43:19Research is doing quite a bit in this space too, to get to a way in, like, 43:23just the way Mechanical Turk works. 43:24We have a very structured contract between how you will go invoke a particular 43:29API or microservice or an agent. 43:31We'll need to get to a point where we can. 43:33We have solved this for software engineering. 43:35We need to bring some of those principles. 43:36It will no longer, I think, be national language in the way 43:39you go talk to these agents. 43:41There would have to be a little bit better software design principles that will 43:44need to be put in place for large scale. 43:47Enterprise deployments. 43:48So the hallucinations are lower. 43:49There's better auditability, evaluations, and things of that nature. 43:53Well, and I think there's two key points, right? 43:55It's like, how do I, as a developer working with an agent, express 43:59something in a very controllable, programmable fashion with very clear 44:03inputs and guarantees on the types of outputs I'm going to receive? 44:07And then, when we talk about agents potentially passing information back 44:11and forth and other ways to compress information and reserve it, there's no 44:14reason that has to be natural language. 44:16Or that is even efficient in any sense of the word and so what is the most 44:21effective way to actually bring that communication bridge some of that gaps. 44:25I again, I really hate the like nursery of agents all running around each with 44:30their own persona of I'm a critic agent and I'm a reflection agent and I'm a email 44:35writing agent and they all work together. 44:37Like how do we set this up? 44:38So it's much more of a program that gets operated, right? 44:42They're not people. 44:43They're not personas. 44:44There's instructions with very clear requirements. 44:46There's all sorts of agentic capabilities in this program, like reflection 44:51loops and validation loops and other things that happen, planning loops. 44:55But at the end of the day, it's a very clear program where information is 44:58passed from one program to another, and eventually a task is executed. 45:01Yeah, it's almost like we've gotten so carried away by like 45:04the dream of the agent that we're like, oh, it's a little person. 45:06But actually the optimal strategy is like, wait, is it just is it just programming? 45:10Like we just have to specify very clearly what we want the software to computer 45:13science is like we're saying, pretty please, I was looking at 45:17a prompt and an agent and part of the prompt said, be persistent. 45:20Like how is this how agent, like how the state of computer science 45:25has evolved to like this is the, our programming instructions to a model. 45:28There has to be a better way. 45:30Yeah, for sure. 45:32Well, great. 45:32Well, that's all the time we have today. 45:33Kate, Volkmar, Shobhit, thanks for joining us. 45:35And thanks to all you listeners. 45:37If you enjoyed what you heard, you can get us on Apple Podcasts, Spotify 45:40and podcast platforms everywhere. 45:41And we will see you next week on Mixture of Experts.