Learning Library

← Back to Library

Agentic AI vs Mixture of Experts

9m • Unknown Channel • ai-ml • deep-dive • intermediate • Watch on YouTube ↗

Key Points

An agentic AI workflow uses a planner agent to assign tasks to specialized agents (A, B, C), whose results are collected by an aggregator to produce the final output.
The “mixture of experts” architecture replaces the planner with a router that dispatches input to parallel expert models, then merges their token streams into a single result.
Agentic workflows rely on LLM‑driven agents equipped with perception, memory (working and long‑term), and domain‑specific tools (e.g., data querying, analysis, visualization) to act autonomously toward a goal.
Mixture‑of‑experts systems focus on parallel processing of the same input by multiple expert models, emphasizing speed and scale rather than explicit decision‑making or memory handling.
Although architecturally distinct, both paradigms represent cutting‑edge AI designs and can be combined—for example, using a router to select agentic modules or integrating expert outputs into an agent’s reasoning loop.

Sections

Full Transcript

# Agentic AI vs Mixture of Experts **Source:** [https://www.youtube.com/watch?v=4-FH09AMsp0](https://www.youtube.com/watch?v=4-FH09AMsp0) **Duration:** 00:09:15 ## Summary - An agentic AI workflow uses a planner agent to assign tasks to specialized agents (A, B, C), whose results are collected by an aggregator to produce the final output. - The “mixture of experts” architecture replaces the planner with a router that dispatches input to parallel expert models, then merges their token streams into a single result. - Agentic workflows rely on LLM‑driven agents equipped with perception, memory (working and long‑term), and domain‑specific tools (e.g., data querying, analysis, visualization) to act autonomously toward a goal. - Mixture‑of‑experts systems focus on parallel processing of the same input by multiple expert models, emphasizing speed and scale rather than explicit decision‑making or memory handling. - Although architecturally distinct, both paradigms represent cutting‑edge AI designs and can be combined—for example, using a router to select agentic modules or integrating expert outputs into an agent’s reasoning loop. ## Sections - [00:00:00](https://www.youtube.com/watch?v=4-FH09AMsp0&t=0s) **Agentic Workflow vs Mixture of Experts** - The speaker contrasts the hierarchical planner‑agent‑aggregator structure of agentic AI workflows with the parallel router‑expert‑merge design of mixture‑of‑experts models, highlighting their similarities and differences. - [00:03:06](https://www.youtube.com/watch?v=4-FH09AMsp0&t=186s) **Specialized Agents and MoE Loop** - The speaker explains how domain‑specific agents (data, analysis, visualization) operate in a perception‑memory‑reason‑action‑observation cycle and interact at the application level, while contrasting this with a Mixture‑of‑Experts neural architecture that routes inputs via a gating network to multiple specialized model experts. - [00:06:16](https://www.youtube.com/watch?v=4-FH09AMsp0&t=376s) **Multi‑Agent Incident Response with Experts** - An enterprise security workflow uses a planner agent to dispatch alerts to specialized agents—including an LLM powered by a mixture‑of‑experts router that dynamically selects specific expert submodels for each token batch—to diagnose lateral movement and recommend actions. ## Full Transcript

0:00If you're familiar with AI multi-agent workflows, 0:04you might have seen some form of this architecture before. 0:08So at the top here, we provide 0:12some input to an agentic workflow. 0:15And that will ultimately kind of flow down here 0:18to produce some output in the end. 0:22Now if we look into the boxes, 0:24typically you would have at the top here, 0:27a planner agent that's responsible for distributing 0:32work to the agents within this workflow. 0:35And then, each of these agents. 0:38So, let's say, we've got an agent here. 0:40We'll just call this agent A 0:42and agent B and C. 0:45Each one of these guys 0:48does specialized work, is specialist in a particular task. 0:53And then once it's done its work, 0:55then the results flow down here to the aggregator. 1:00And that aggregator agent prepares a response. 1:03And that's how we get our output. 1:04So that is an agentic AI workflow. 1:07But in AI, there's another architecture 1:10that we've had for quite a while that's gaining popularity. 1:14And that is called mixture of experts. 1:18And at a high level, 1:20it has a very similar-looking workflow. 1:24So instead of a planner, we start with a router 1:27which receives the input and that dispatches the work. 1:30And then we have here a series of experts. 1:35Let's call this expert A, expert B and expert C. 1:41And these guys, they work in parallel. 1:45And then at the bottom here, we have a merge component 1:50to reassemble the process tokens into a single stream. 1:53So it kind of begs the question: 1:57what's the difference between these two things AI agents and mixture of experts? 2:02And the answer is, well, 2:05quite a lot of difference actually. 2:07But something they both have in common is 2:09they are very much part of frontier AI models today. 2:12So let's discuss what they do 2:14and how they can be used together. 2:17So AI multi-agent workflow is they perceive their environment, 2:21they make decisions, and they execute actions towards achieving a goal. 2:24And all of this happens with minimal human intervention. 2:28The agents, they typically use LLMs that have been given 2:31specific roles and tools and contexts. 2:33Now agentic AI workflows, they're 2:35usually composed of modular components, 2:38like, for example, one module 2:40that might be the perception module. 2:43That's kind of how the agent senses or ingests 2:46information from its environment or its user input. 2:49Then there's also a component 2:51typically for memory. 2:53This is the knowledge store. 2:56That memory can be working memory for remembering the current context. Or, 2:59it could be long-term memory for knowledge accumulation 3:02over time, like domain facts or remembering user preferences. 3:06And then there's an assortment of specialized agents. 3:10And these are agents that excel at specific domains. So, 3:14for example, we might have one specialized agent 3:17that is a data agent that knows how to query databases and clean data. 3:22We might also have a specialized agent called an analysis agent 3:26that's trained on business intelligence. 3:29And then maybe we also have a visualization agent as well 3:34that creates charts and graphs. 3:37Now architecturally, these components, 3:40they form a loop. 3:42So there's really different stages to this. 3:44So first of all, they perceive, 3:47then they're going to consult some form of memory. Remember, 3:51the memory component. 3:53From there, they're going to reason, 3:57and they're going to act based upon that reason. 4:01And then finally, they're going to observe what happens 4:04based on that action. 4:06And then kind of round and round 4:08we go in this loop. 4:10And the key here is that each of these agents 4:14operates at the application level. 4:16They're making decisions, they're using tools, 4:18and they can communicate with each other. 4:21The mixture of experts, on the other hand, 4:23that operates at the architecture level, and MoE 4:26is a neural network design 4:28that splits a model into multiple experts. 4:33let me draw three experts here, 4:37although in reality there'll be a lot more. 4:39And each of these experts specializes 4:43in a part of the input space. 4:45And then there's also a gating network at the top 4:49that routes the input 4:52to the different experts in this mixture of experts architecture. 4:56And it goes through this before it gets to the next layer 5:00coming into the merge component here. 5:03And that receives all of the responses 5:05from the experts that were invoked 5:07and then performs mathematical operations to basically combine the the output tensors 5:12from these different experts into a single representation 5:15that continues through the rest of the model. 5:17And one of the big advantages of MoE 5:21is sparsity, because only the active expert parameters 5:26contribute to that input's computation. 5:29So if we take an LLM example, 5:32let's say the IBM Granite 4.0 Tiny Preview model. Well, 5:37that uses 64 different experts 5:42in its architecture, 5:45and it has around 7 billion 5:49total parameters in the model. 5:53But of those, only about 1 billion 5:58are active at inference time. 6:01So that makes it a pretty memory-efficient language model 6:06capable of running on a single, pretty modest GPU. 6:10So in an MoE model, 6:13these experts, they're not separate AI agents. 6:16They're specialized neural network components 6:20within the same model. 6:22So let's consider a use case where a multi-agent workflow 6:26and a mixture of experts models show up in the same system. 6:29So let's just imagine an enterprise incident response workflow. 6:34So we've got a security analyst who's going to start this off. 6:38And they're going to start off as input by providing 6:41just an alert bundle into our model and 6:45maybe a short natural language question, like 6:48is this lateral movement? 6:50And if it is, what should I do about it? 6:53Now that goes into our agentic workflow, and we have a number of components there. 6:58So we have, for example, a planner 7:01agent that's going to be the first component that breaks up the request. 7:05And then it kind of spins up the agentic workflow 7:08and then passes it along to these specialized agents. 7:11So we might have a log triage 7:15agent here that parses the raw telemetry, 7:18and we might have a threat intel agent here 7:23that processes indicators and so forth and so forth... 7:27... as we go down this workflow. 7:30And this log triage agent 7:32that could actually be implemented as an LLM that uses 7:37mixture of experts as its architecture. 7:41So as the tokens stream in 7:45to the mixture of experts gating network 7:48or the router, if you like, that looks at each micro 7:52batch of text and on the fly decides 7:55which handful of experts inside the model should handle it. 7:59Now, only those experts, perhaps, 8:01we'll go through to expert 8:04one and maybe expert two, 8:07but none of the other experts. 8:09We'll just kind of leave those alone. 8:11So perhaps it would just use two experts 8:13out of a total of 64, 8:16and it would just activate those for that particular micro batch. 8:20So the forward pass, it's just going to touch a fraction 8:23of the overall parameters. And the selected experts, 8:26they process their slice of the representation in parallel, 8:30and then they come back down 8:32to the merge function 8:34down here that mathematically stitches 8:37that outputs back together before the next transformation layer. 8:40So perhaps this log triage agent 8:42is 7 billion parameters as an LLM, 8:46but only about 1 billion active 8:47parameters are used during inference. 8:50So agents, 8:51they route tasks across the workflow. 8:54They decide the next step, maybe call at all, 8:57update shared memory and stuff like that. 8:59A mixture of experts routes tokens inside 9:02a single model, deciding which internal parameter slices 9:06light up the next few milliseconds of compute. And stack them well, 9:11and you get workflows that reason broadly 9:13and specialize deeply.