Video lMrOvPloJ0o
Key Points
- Machine learning’s inherent probabilistic nature guarantees a persistent error rate, highlighting the need for breakthroughs beyond current technologies to achieve truly human‑like conscious decision‑making.
- The “Mixture of Experts” podcast episode brings together experts Olivia Bjek, Chris Haye, and Mihi Cre to discuss the week’s AI headlines, including radiology advances, manifold research, and a major IBM‑Anthropic partnership.
- Recent AI news features AMD’s multi‑billion‑dollar chip supply deal with OpenAI (including a potential 10% equity stake), the integration of synthetic diamond for superior chip heat dissipation, IBM’s Project Bob boosting developer productivity by 45%, and Peloton’s AI‑powered “IQ” trainer offering real‑time workout guidance.
- OpenAI’s release of Agent Kit introduces a new user‑friendly agent builder and updates to its evaluation platform, marking a significant step forward in the rapidly evolving “agent” ecosystem.
- The episode emphasizes that while AI tools are accelerating productivity and expanding capabilities across industries, the field still faces fundamental challenges in achieving reliable, human‑level decision making.
Sections
- Limits of Machine Learning & AI News - The excerpt begins with a commentary on the probabilistic nature of machine learning versus human decision‑making, then introduces the “Mixture of Experts” podcast and its expert panel before previewing upcoming discussions on radiology AI, IBM‑Anthropic partnership, OpenAI’s agent kit, and AMD’s chip deal with OpenAI.
- From Codegen to Low‑Code Evolution - The speaker explains how IBM’s acquisition of DataStax (bringing Langflow) and tools like Crew AI and Langraph illustrate a shift from pure generative code‑generation toward integrating low‑code, visual builders that make AI agent development accessible even to those without deep programming expertise.
- Limits of Visual Programming Paradigms - The speaker argues that while visual tools such as UML, Scratch, and Node‑RED are attractive, they fail to scale across large, heterogeneous legacy systems, and the advent of AI does not alter this fundamental limitation.
- Simplifying AI Agent Development - OpenAI’s agent builder enables non‑technical users to create agents through visual workflows while exposing the underlying TypeScript/Python SDK for programmers, mitigating complexity even with features like the Common Expression Language.
- Enterprise AI Agent Lifecycle Discussion - The speaker highlights OpenAI's lucrative subscription base, then discusses IBM's partnership with Anthropic and introduces a structured agent development life cycle for securely deploying enterprise AI agents.
- Building an Evolving Agent Ops Framework - The speaker outlines how IBM and partners co‑created a guide, stressing continuous development, cross‑industry adoption, evaluation challenges, and integration of new tools into a unified Agent Development Life Cycle (ADLC) platform.
- Personal vs Enterprise Agent Automation - The speaker argues that while AI agents today mainly automate individual tasks, real industry impact will come from enterprise‑wide workflows where smaller, purpose‑built models often outshine merely larger ones, as illustrated by a video game generating NPC backstories locally rather than querying a massive central model.
- Explaining Gradient Explosions with Manifolds - The speaker outlines how deep neural networks can suffer gradient explosions during training and why visualizing the loss landscape as a curved manifold (rather than a flat plane) helps keep the model’s weights stable.
- Manifolds as a Path to Stable AI - The speakers reflect on an early 2016 study, discuss how better use of manifolds could improve model training, fine‑tuning, and overall AI stability, and speculate on the practical implications of such advances.
- AI in Radiology: Myth vs Reality - The speaker critiques the hype that computer‑vision will replace radiologists, using a recent investigative article to show how the expected disruption has not materialized as predicted.
- AI Threat to Radiology - Olivia debates whether advancing AI agents will replace radiologists, emphasizing unresolved trust concerns, data bias pitfalls, and the current limitations of machine learning despite improving diagnostic accuracy.
- Debating AI Reliability vs Human Judgment - Two participants argue over whether probabilistic machine learning can ever replace human decision‑making, especially in high‑stakes situations.
- Human Oversight in AI Radiology - The speaker warns that AI diagnostic tools without radiologist supervision are vulnerable to cyber‑threats and asserts that ultimate accountability, context, and communication must remain with human clinicians.
Full Transcript
# Video lMrOvPloJ0o **Source:** [https://www.youtube.com/watch?v=lMrOvPloJ0o](https://www.youtube.com/watch?v=lMrOvPloJ0o) **Duration:** 00:43:43 ## Summary - Machine learning’s inherent probabilistic nature guarantees a persistent error rate, highlighting the need for breakthroughs beyond current technologies to achieve truly human‑like conscious decision‑making. - The “Mixture of Experts” podcast episode brings together experts Olivia Bjek, Chris Haye, and Mihi Cre to discuss the week’s AI headlines, including radiology advances, manifold research, and a major IBM‑Anthropic partnership. - Recent AI news features AMD’s multi‑billion‑dollar chip supply deal with OpenAI (including a potential 10% equity stake), the integration of synthetic diamond for superior chip heat dissipation, IBM’s Project Bob boosting developer productivity by 45%, and Peloton’s AI‑powered “IQ” trainer offering real‑time workout guidance. - OpenAI’s release of Agent Kit introduces a new user‑friendly agent builder and updates to its evaluation platform, marking a significant step forward in the rapidly evolving “agent” ecosystem. - The episode emphasizes that while AI tools are accelerating productivity and expanding capabilities across industries, the field still faces fundamental challenges in achieving reliable, human‑level decision making. ## Sections - [00:00:00](https://www.youtube.com/watch?v=lMrOvPloJ0o&t=0s) **Limits of Machine Learning & AI News** - The excerpt begins with a commentary on the probabilistic nature of machine learning versus human decision‑making, then introduces the “Mixture of Experts” podcast and its expert panel before previewing upcoming discussions on radiology AI, IBM‑Anthropic partnership, OpenAI’s agent kit, and AMD’s chip deal with OpenAI. - [00:03:04](https://www.youtube.com/watch?v=lMrOvPloJ0o&t=184s) **From Codegen to Low‑Code Evolution** - The speaker explains how IBM’s acquisition of DataStax (bringing Langflow) and tools like Crew AI and Langraph illustrate a shift from pure generative code‑generation toward integrating low‑code, visual builders that make AI agent development accessible even to those without deep programming expertise. - [00:06:45](https://www.youtube.com/watch?v=lMrOvPloJ0o&t=405s) **Limits of Visual Programming Paradigms** - The speaker argues that while visual tools such as UML, Scratch, and Node‑RED are attractive, they fail to scale across large, heterogeneous legacy systems, and the advent of AI does not alter this fundamental limitation. - [00:10:00](https://www.youtube.com/watch?v=lMrOvPloJ0o&t=600s) **Simplifying AI Agent Development** - OpenAI’s agent builder enables non‑technical users to create agents through visual workflows while exposing the underlying TypeScript/Python SDK for programmers, mitigating complexity even with features like the Common Expression Language. - [00:13:10](https://www.youtube.com/watch?v=lMrOvPloJ0o&t=790s) **Enterprise AI Agent Lifecycle Discussion** - The speaker highlights OpenAI's lucrative subscription base, then discusses IBM's partnership with Anthropic and introduces a structured agent development life cycle for securely deploying enterprise AI agents. - [00:16:28](https://www.youtube.com/watch?v=lMrOvPloJ0o&t=988s) **Building an Evolving Agent Ops Framework** - The speaker outlines how IBM and partners co‑created a guide, stressing continuous development, cross‑industry adoption, evaluation challenges, and integration of new tools into a unified Agent Development Life Cycle (ADLC) platform. - [00:19:49](https://www.youtube.com/watch?v=lMrOvPloJ0o&t=1189s) **Personal vs Enterprise Agent Automation** - The speaker argues that while AI agents today mainly automate individual tasks, real industry impact will come from enterprise‑wide workflows where smaller, purpose‑built models often outshine merely larger ones, as illustrated by a video game generating NPC backstories locally rather than querying a massive central model. - [00:23:45](https://www.youtube.com/watch?v=lMrOvPloJ0o&t=1425s) **Explaining Gradient Explosions with Manifolds** - The speaker outlines how deep neural networks can suffer gradient explosions during training and why visualizing the loss landscape as a curved manifold (rather than a flat plane) helps keep the model’s weights stable. - [00:27:34](https://www.youtube.com/watch?v=lMrOvPloJ0o&t=1654s) **Manifolds as a Path to Stable AI** - The speakers reflect on an early 2016 study, discuss how better use of manifolds could improve model training, fine‑tuning, and overall AI stability, and speculate on the practical implications of such advances. - [00:31:31](https://www.youtube.com/watch?v=lMrOvPloJ0o&t=1891s) **AI in Radiology: Myth vs Reality** - The speaker critiques the hype that computer‑vision will replace radiologists, using a recent investigative article to show how the expected disruption has not materialized as predicted. - [00:35:24](https://www.youtube.com/watch?v=lMrOvPloJ0o&t=2124s) **AI Threat to Radiology** - Olivia debates whether advancing AI agents will replace radiologists, emphasizing unresolved trust concerns, data bias pitfalls, and the current limitations of machine learning despite improving diagnostic accuracy. - [00:38:47](https://www.youtube.com/watch?v=lMrOvPloJ0o&t=2327s) **Debating AI Reliability vs Human Judgment** - Two participants argue over whether probabilistic machine learning can ever replace human decision‑making, especially in high‑stakes situations. - [00:42:33](https://www.youtube.com/watch?v=lMrOvPloJ0o&t=2553s) **Human Oversight in AI Radiology** - The speaker warns that AI diagnostic tools without radiologist supervision are vulnerable to cyber‑threats and asserts that ultimate accountability, context, and communication must remain with human clinicians. ## Full Transcript
Machine
learning is fundamentally probabilistic
and humans are not. So I think there is
always going to be an error rate with
machine learning techniques as they have
currently been developed. Would have to
be some kind of other advance, some kind
of other technology to act like a human
and have actual like conscious decision
making. It's actually a huge engineering
challenge.
>> All that and more on today's mixture of
experts.
[Music]
I'm Tim Hang and welcome to Mixture of
Experts. Each week, Moe brings together
a panel of brilliant, funny, and in the
case of Chris Haye, somewhat unhinged
technical experts to discuss and debate
the week's news in artificial
intelligence. This week, we've got an
incredible panel. We've got Olivia Bjek,
senior staff dev advocate, Chris Haye,
distinguished engineer, and Mihi Cre,
distinguished engineer, Aentic AI. All
right, we've got another packed episode
again this week. Uh we're going to talk
about radiology, we're going to talk
about manifold, we're going to talk
about a huge partnership between IBM and
Enthropic. Uh as well as OpenAI's agent
kit release. But first, we've got Eiley
with the news.
[Music]
Hey everyone, I'm Eiley McConn, a tech
news editor for IBM Think. I'm here with
a few AI headlines you might have missed
this week. Chipmaker AMD signed a deal
to supply open AI with billions of
dollars worth of chips. In exchange,
OpenAI could get up to a 10% stake in
AMD. Are diamonds a computer chip's new
best friend? Companies are starting to
embed tiny pieces of synthetic diamond
into chips because diamonds are
exceptionally good at moving heat. These
chips could ultimately help data centers
generate less heat, currently a big
waste of electricity. This week, IBM
introduced Project Bob, a new set of
tools for developers to automate complex
processes like code development. When
6,000 IBM developers tested Project Bob,
their productivity increased on average
by 45%.
Are you trying to get in shape before
the holidays? Bike maker Pelaton has
introduced Pelaton IQ, an AI assisted
feature that acts like your personal
trainer. It gives you feedback on your
form and suggestion for weights and
workout plans. Want to dive deeper into
some of these topics? Subscribe to the
Think newsletter linked in the show
notes. And now back to the episode.
All right, so let's just dive into it.
So, the big news of the week is OpenAI's
release of Agent Kit, which is really
kind of a two-part uh announcement. They
talked a little bit about their new
agent builder that they've created,
which is sort of a clean sort of user
experience for designing agents. Um, as
well as a number of updates on its
evaluation platform. Um, and Olivia,
maybe I'll I'll turn it to you. How big
of a deal is this? I mean, I feel like
we've really been in a year of agents.
It's been a joke on that we say agent
like repeatedly every single episode,
but this builder seems to be maybe the
first attempt in my mind to kind of like
really make some of this technology like
broadly accessible if you're not sort of
technical. Is that right?
>> I don't know if it's the first attempt,
but it's certainly a very strong one. Um
we IBM recently acquired data stacks
which brings Langflow. Um, and Langflow
I think brings a lot of that uh low code
builder experience that you see in the
agent kit um into uh available in open
source and that's something that we're
starting to try to integrate into a lot
more IBM products as well. Um and then
before that we see things like crew AI
and Langraph. Um Langraph I think I
would agree with you it requires much
more of a technical edge. Um but then
crew AI is as long as you really can
write some code and almost anyone can
write some code now especially basic
agent code with uh you know generative
AI at this point. Um you can see people
getting agent things across uh out of
the box pretty easily. And there's I
think a fun kind of evolution here and I
think you touched on something I would
love to get your thoughts on is you know
I saw this and I was like oh it's really
funny because like with all the stuff
that's been happening in codegen I think
we originally had maybe a vision that's
like well it's okay you'll just tell the
machine what you want and then it'll
generate the code and and everybody will
be able to code is essentially the
direction we're going in. This almost
seems like a step back, not maybe a step
back, but a step in a different
direction to say, okay, even in a world
of codegen, we will still need kind of a
graphic kind of no code interface for
managing all this stuff. And I guess I'm
curious, do do you buy one approach or
the other or do you feel it's like still
unclear which one's going to kind of win
out over time? Yeah. So, I have actually
some pretty strong feelings on this
because I've been involved in AI in one
form or another for about the last 15
years. And pretty much every time we
have a big evolution, people are like
the machine learning code, it's going to
replace everything. And you know, it's
been 20, 30 years. Realistically, we've
had AI in some form since the 1960s. And
you, you know, it still hasn't happened.
It's still there. There was a paper
about 10 years ago from Google that put
out this very pretty little diagram
where it showed that um only about 10%
of the code of any machine learning
system was actually machine learning.
And I think that still holds true um in
reality. So if you're asking the LLM to
do all of your compute, you're
essentially saying, "Okay, I would like
the most expensive part of my system to
do everything." even if you have a plan
in mind. So, you know, let's say that
you wanted it to check your email every
morning and uh look through and find the
highest priority items that are in
there. Um you probably don't really want
the LLM choosing its own adventure every
morning. You probably want it to do
essentially the same thing every day. So
agent frameworks are really the only way
that we can mix that kind of classic
deterministic pro programming with
something that's a little bit more
probabilistic like this.
>> Mihi, maybe I'll kick it over to you
next. Um, I saw the kind of OpenAI sort
of agent kit sort of experience getting
some critique online where people were
saying this kind of way of doing this
where it's like a bunch of blocks that
are all kind of wired together like
maybe this is like not the most
efficient or easiest way of managing
this stuff. And so I think like to maybe
build off of what Olivia is saying like
I think if we even if we say okay we're
going to accept this kind of graphics
paradigm. Uh, I guess I'm kind of
curious about your view of like is this
kind of like blocks and wires sort of in
your opinion kind of the way we're going
to go about visualizing this or are
there other ways that you think are
going to be like ultimately kind of the
dominant way we do this?
>> I think code is there for a reason and
the fact we now have AI um doesn't
change that in any manner. We've tried
to do this before with UML. If you
remember back in the day when we used to
think that everything could be designed
and developed just using diagrams and
there were a lot of products including
from IBM, you know, rational rows and
all these kind of things with the
promise of writing UML as code. You
translate from UML to code, from code to
UML, and you design everything visually.
And it requires specific paradigms like
object-oriented languages for this to
work. And it sort of worked for
I would say a bit. But once you went
beyond you know single developer small
projects you extended into legacy
projects you wanted to integrate with
different systems different programming
languages a lot of the paradigms broke.
We see this with agents as well where
you have platforms like NA10 and lung
flow and flow wise which take the same
paradigm as now the agent kit from open
AAI where you have this visual
programming language. Now it's very
attractive because even as kids for
example
you typically learn programming
visually. So if you've done you know
turtle programming or scratch which is
like a visual ID for kids where you kind
of learn how to program using these
exact same blocks or even node red an
open source project from maybe 15 years
ago which taught you how to compose
things for the internet of things using
visual programming languages. It's very
attractive. You can get started very
quickly. The problem is scaling. Once
you go beyond a single contributor, once
your project goes beyond a single
workflow, once you need to integrate
with existing systems, once you need to
do things like version control and
management, it becomes very very
unwieldy. So I would say it has its
place, especially for things like
one-off automations where you, you know,
it's one and done. But if you want to
build a real application, I still
believe there's a strong strong case to
make. You should be writing code to
begin with.
>> Oh, Chris, you want to jump in? Of
course I want to jump in. Um
I I I just I I gonna rebut that a little
bit. Mihi. So I mean I think to your
point I'm going to cover two things. I
think the first thing is that um I think
this is really about making agents
available to everybody. And to your
point about scratch actually the average
human being can go and create agents now
uh in a very very simple way. It really
doesn't take more than 5 to 10 minutes
to learn how to use the tool and be able
to create an agent very quickly. And and
actually if if we think about the sort
of workflows and the things that people
want to be able to do it, it is clicky
clicky, right? It's like I want to go
and connect to my Google mail. I want
and there's a connector there available
for this. I want to go and write
something to Google Sheets and there's a
connector available to this. And then if
we think of even enterprise systems for
a second, the reality is we want to be
able to uh protect our inputs. So be
able to put the kind of uh the guard
rails, you know, on either side of of
your inputs as well. So that's two or
three blocks and and we have to do that
today in enterprise compute anyway. So
for the ability for somebody to be able
to quickly stick a guardrail in there,
maybe do a little bit of routing and
even be able to do multi- aent, right?
because it's just a matter of sticking
another agent in there and then
connecting them up. I I think you're
pushing the bounds to your point me high
of of if you are not a coder that
ability without knowing about tools like
lang flow etc that you don't really have
good ways of doing that for the mass uh
consumers. So I think you're going to
get better agents from consumers there.
to your point about programmers, I I
completely agree. I I do agree with your
point that um you know things get
sufficiently complex and then you're
like I really need a code representation
but but then at the same time what's
quite nice about the way it openai has
done the agent builder it's actually
built on top of the agent SDK. So if we
actually look underneath of the the h
the hood of the workflow actually you
have your agent SDK code there which can
be in TypeScript or Python and you can
just take that representation. So you
can you can start with the the workflow
but then you can just source control it
as code there. So there's nothing
stopping you from from doing that. Um so
I think what they've done is is quite
clever and and actually some of the
things they've really done well is take
away the complexity. So one of the
things I sort of panicked at was like oh
my goodness they've introduced common
expression language right so if then
else is and I was like there is no way
the average human being is going to be
able to understand that but but actually
what but what they've done there is
really good especially for structured
outputs is they just put an LLM in the
way and you just describe what the
schema you need is and then it will go
and generate it for you and you're like
actually that is a pretty cool technique
so I I I I there is complexity in But I
think what they've done to make it
easier is quite useful. So I I think I
am excited. But to your point, it's just
nowhere near as powerful as things like
Langflow. Um uh for example,
>> Mihi, maybe a final question to you on
this. So I think Olivia, you kind of
called out uh my original prompt on this
being like, well, hold up. This is not
the first attempt in the space. There's
a bunch of other players in the space.
Um but obviously whenever OpenAI does
something, it's just a 800 pound gorilla
kind of moving around. I guess mihi, do
do you have a think thought on like how
the business strategy of this evolves,
right? Like if you're not OpenAI and
you're seeing them do this, what's like
the next game if you're really trying to
compete in this sort of like no code
sort of agent builder um kind of
ecosystem?
>> I think what OpenAI has going for them
is volume. They have the market. They
have hundreds of millions of users and
consumers and they can afford to be
slow. introduce features that don't
necessarily try to solve everything, but
what they solve has a good user
experience is easy enough for these
users to kind of understand and
eventually this will make its way
towards enterprise systems where the
same consumers want to do for example
citizen automation and we've seen this
before with things like power automate
for example or excel macros where the
premise was the same that anyone or even
Lotus notes like Lotus applications any
citizen within the organization can go
in can drag and drop a couple of things
can write a bit of automation and do
something that's a one-off tasks or even
turn it into an enterprise application.
So I do see how some of these systems
could make their way towards enterprise
as well. And I do see a business play
there. But I think for OpenAI is
sufficient just to have their current
monetization
and hundreds of millions of users all
paying $20 a month is going to be quite
substantial.
[Music]
Well, this is great and a good way to
kind of flow into the next topic that I
want to cover today. Um, so some other
really big news and mihi, we we'll stay
with you because I think like you were
directly involved in this, but um, IBM
has announced this strategic partnership
with Enthropic to integrate Anthropic
um, into a bunch of its sort of tools
and methodology. Um, and I think one of
the things I know you direct were
directly involved in this project that
really stuck out to me and we've talked
about it on the show before is kind of
this idea of creating a guide for how
you should securely deploy enterprise AI
agents. Um, and specifically this idea
that we're going to have this thing
called the agent development life cycle,
which is going to be sort of a
structured way that people should go
about doing this. And I find this so
interesting because it feels like, you
know, we're always talking about agents,
but we always focus on the technology.
feels like this is maybe like one of the
first things I've seen although Olivia
keep me honest again if it's not
actually the first thing um where it
does feel like there's now starting to
be a lot more thinking in terms of like
okay well what's the whole set of
business processes that kind of need to
fit around this technology and so I
don't know if you were directly involved
in the ADLC kind of guide work but um uh
if you were or if you weren't would be
curious to get your thoughts
>> yeah I I was I was working on the ADLC
guide as well so one of the things that
came out of that exercise was that
agents do need to have their own process
similar to software development life
cycle but it needs to deal with the
non-robabilistic sorry the probabilistic
nature of large language models and take
into account things like for example
testing and testing of AI agents that
needs to be done in a different way so
for example true evals in fact I believe
agent kit which was released from open
AAI touches on that as well one of the
components it starts to offer or add to
the mixture is the component of evals
how are you going to ensure that the
outcomes of these agents are correct
either in line so as the agent executes
it can go oops that's the wrong result
I'm going to go back I'm going to retry
or after the agent has executed I can
take one agent 100 agent executions and
look at um the accuracy and look at some
of the numbers of these agents and
having a structure governed process
around this for planning coding building
testing releasing deploying operating
monitoring the life cycle of agents is
important for enterprises and many of
the non-functional requirements, things
like encryption and security and
governance and uh all the things that
come with traditional enterprise
software need to be weaved into this
approach and we've seen that today AI
agents and AI development is quite
immature and one of the projects we've
started in this space is context for GCP
gateway which supports A2A supports MCP
and provides support for the agent
development life cycle and agent ops. So
we see a similar need. Uh we see that
these are things that enterprises are
saying must happen before any of these
agents are allowed to touch production
systems.
>> That's really uh that's interesting. So
uh I guess m where does it go next? I
guess in terms of this development now
that the guide is out you know is part
of this now kind of testing it in the
field or you know how do we develop this
out? I'm really interested in basically
how these processes now become sort of
like industrywide right we're talking
about adoption now
>> to build this guide. A lot of folks from
IBM had to come together, folks from
consulting, from technology, from
research. We've collaborated with
ventropic as well. Uh, and we've
leveraged our experience in both
customer engagement. So, we've looked at
healthcare clients where we've
implemented agents. We've looked at
telco clients. We've looked at banking
clients. Um, but it's just a start. This
needs to be an ongoing evolving process
and document. It needs to reflect all
the latest and greatest changes in
fields like eval for example. Do you
trust another agent to evaluate your
agent? Does an LLM that evaluates itself
have any bias in that evaluation? So all
these things need to flow back into the
ADLC and we need to see tools and
technologies develop around this better
evals better components such as the MCP
gateway uh components that support for
example the development life cycle
itself like for example project Bob
which we've also announced uh as part of
this. So all of this needs to come
together into one more cohesive agent
ops and ADLC agent ops platform and ADLC
process. Um, Levi, I'm kind of curious
about kind of the market evolution that
a strategic partner like ship a
strategic partnership like this signals.
Um, I remember it was only like I feel
like 24 months ago where there's kind of
a vision that it was like going to be
one model to rule them all, right? Like
eventually you have an AI company that
creates like such a powerful capable
model that everything else follows suit,
right? But it does seem like not just
Anthropic but all the big kind of AI
players uh uh are kind of thinking about
how partnerships work in the space which
sort of suggests to me that like a
company like Enthropic is saying well we
want to get into enterprise we don't
really know a whole lot about enterprise
like IBM knows enterprise so we have to
partner um and so does this mean that
like this the industry is ultimately
going to be characterized by a lot more
partnerships over time than this kind of
original model which is okay it's one AI
company and they kind of end controlling
everything. It feels like it's this is
going to be more multipolar than we
thought. Um is is that the right way of
thinking about what's happening here?
>> Yeah, I think it's an interesting point
especially in context of the agent
development questions that you were
having before where basically um I think
what we're seeing is that LLM
engineering is a lot more complicated
than uh we initially thought as as you
have said originally we kind of thought
hey we'll just hand all our problems off
to an LLM. the LM's going to, you know,
LM is clearly just as good as a person.
We can just have it do anything and it's
going to all work out great. And I think
we're finding that that's not 100% true.
And there's also some theoretical limits
that I think we're starting to bump up
against a bit when it comes to sheer LLM
training. Um, so what that means is
either people need to partner or they
need to uh get really good each in their
own realm at uh at solving all of the
same problems that Mihi is discussing.
So, you know, I as a um I I've recently
been looking at the um this space as it
uh pertains to how do you look at the
use cases for um uh for agents basically
and I think it really divies up into two
things. One is you're automating a piece
of your own work or you are automating
something a process for the business.
And right now a lot of the uh agent
world has focused on how do you automate
things for yourself and that's cool.
That's you know that's a little bit
labor saving. It's definitely going to
uh be helpful for people and yet at the
same time it's not actually going to
transform any industries until we start
building workflows that help entire
enterprises and entire businesses really
do something better than they were
before and more reliably than they were
doing before. um when it comes to things
like that, it means that uh just
throwing a bigger and larger model at it
isn't always the answer. Um and
sometimes we also see things that uh
where a smaller model is the answer. So,
for example, um I I'll I'll give
something that's totally off-the-wall
relative to like su super enterprise use
cases, but there's a video game I've
been playing where they're starting to
use for some of the non-player
characters, they're starting to generate
background stories for them um using
generative AI. Do they really want to be
sending for all of those randomly
generated characters in the world back
to a a core model that's like hosted?
No, probably not. Like the compute level
for that makes no sense in especially in
a video game which is already really
taxing on the GPU. So what you want
there is a usable LLM that's really
small. So that means that each company
has sort of been um that has been
playing in this model space has
different strengths. Not all models are
going to fit all use cases. And so that
means yes partnerships are have to be
the future. Uh companies talking to each
other has to be the future. Um honestly
when have we seen a point in technology
when that hasn't been true? You know, I
don't see any reason to believe that
this this bit of technology is so
different from history.
>> Definitely. All right. Well, let's end
the segment with two quick questions.
Uh, Mihi, if people want to learn more
about this, where should they go?
>> So, we have the white paper out which
you can find on IPM's website. Just
search for architecting secure
enterprise AI agents with MCP. I think
it's a great read. And
>> not that you're biased.
>> Honestly, many of these problems haven't
been not not being biased. Um, and many
of these problems don't have mature
solutions. So, if you want to innovate
in this space, this is perfect.
>> Yeah. Great. And Olivia, final question
quick for you is, what's the video game
you've been playing?
>> Uh, Enzoy.
>> Okay, great. You should check it out.
I'm going to check it out. That sounds
awesome.
>> All right, I'm gonna move us on to our
third topic of the day. Um, we have not
talked very much about thinking
machines. Um, so we've talked about SSI
and we've talked about a number of
companies that have kind of spawned out
of former OpenAI leadership. Um, and
thinking machines, if you haven't been
tracking it, uh, is Mera Morades, who's
the former OpenAI CTO's, uh, startup,
uh, and fundamental kind of research
lab. Um, and I want to cover it just
because they they put out a piece fairly
recently that I thought was very
interesting and I think is worth diving
into and kind of like explaining and
parsing through a little bit more. Um
the name of the blog post is called
modular manifolds and you can find it on
the thinking machines uh website. Um and
I guess maybe to start uh Chris, we
haven't gone to you because I wanted to
save you to be the leading person on
this segment. What is a manifold
exactly?
>> What what are you doing to me Tim? What
are you No pressure. No pressure.
>> I am not smart enough to answer this
question. So I think the quick version
of this is when we are training models,
right? These models are really really
kind of deep and we're throwing the
entire internet worth of data at them
and then there's lots and lots of
layers, right? So it's called deep
learning and the good news is the reason
it's called deep learning is because the
layers go very very deep. But as it's
training on these models and the you
know and they're they're faring away
basically what happens is the weights of
the models change. Now when we are
training the model, small shifts on
those weight trainings will can
potentially send the model off and just
basically have a gradient explosion and
therefore it basically trashes the
model. And and the reason that's
happening and and please don't ask me
questions beyond this, but but basically
what's happening here is the the the
model is effectively on the gradient and
it's an and it's a flat surface. So the
bigger these shifts, the more likely
you're going to get this explosion. It's
going to go off. What a manifold is
doing is it's basically more running on
a curvature rather than a flat plane.
Kind of like the Earth in that sense. So
rather than when you're making those
shifts, rather than the uh you know
effectively
exploding off and going off into into
space, you're staying within that plane
and therefore it keeps the model on
track and and that's effectively what's
going on. So the the best analogy I can
come up with is probably gravity, right?
which is if you think of an astronaut
and the astronaut is floating around in
space and then you push them in one
direction and they could just go off
into deep space and you'll never ever
seen them again. And what we do is you
go, "Haha, we've got a little we got a
little coil to them and we'll pull them
back in." And that's effectively what
you're doing when you're training,
right? The astronauts floating off and
then you pull them off. You're making
these quick adjustments. But but
actually in in my analogy of of deep
learning here is we don't need to sort
of pull the astronaut back in because
gravity is keeping the astronaut within
uh the planetary space and they're not
going to float off into deep space. So I
think I've done a great job of confusing
the listeners even more. Um and and if
you want to read my book on deep
learning for idiots that don't know
anything, you feel free to check it out
on Amazon.
>> That was great. I mean that's that's
that's that was brilliant. Uh again, I
was telling all the guests before uh we
started recording that like this is a
hard topic. Uh and I'm really interested
in like how do we explain this because I
think so often ate it's easy to get
caught up in like what the apps are
doing or what the latest features are.
There's some real kind of fundamental
research still kind of ticking along in
the background that we don't talk about
enough. Um, and so,
>> and Tim, that's actually just just
before we sort of jump on there. I
actually think that's one of the
interesting things that thinking
machines is doing. They're taking a
they're taking a really different
approach to all the other labs is you
know if you saw their paper on um you
know basically how to make LLM and for
instance deterministic as opposed to
non-deterministic one of their blog
posts it feels as if they are
fundamentally going back to each part of
the training process and challenging in
the assumptions that we have and
releasing these kind of mini bits of
papers that just explain how they're
trying to improve thing at the micro
level and I and I think that's
interesting and great. Um, and I'm sort
of excited to go, well, if if if all of
these things add up well, what is what
is their model going to look like at at
some point, assuming they're going to
get there. So, I think I think they're
taking an interesting approach which is
a little bit more scientific and
engineering focused.
>> Yeah, absolutely. Yeah, I totally agree.
I mean, I was reading it and actually I
have my uh sort of Nurips 2016 mug here
and it was like kind of like a weird
like throwback. I was like, "Oh, wow.
this is like kind of like much more
early days like on the research whereas
like let's just talk about the
mathematics of this representation for
you know 90 minutes um and I think that
throwback is like yeah I agree with you
Chris is like very very interesting as
like something quite distinct as an
approach from from a lot of the kind of
players in the space um Olivia so you
know Chris has done his valiant best I
think to explain what manifolds are in
the training process and it feels like
you know if we can say anything about
this blog post is you It's it's
attempting to find a way to do this
better. Um, and I guess do you want to
speculate a little bit? I mean, like,
you know, let's abstract away all the
kind of technical complexity for a
moment. If we're able to kind of like
stabilize and use manifolds more
effectively, what's that mean for AI?
What's that mean for training? What's
that mean for fine-tuning? What are the
implications practically? If you're just
someone listening to this being like,
space people being roped back, what what
does that mean for for, you know,
day-to-day and AI? goodness. I'm I'm not
going to say that I know the answer, but
just speculating using uh some very old
knowledge, I'm going to say that that
basically looks like um models staying
on target better, models keep um not uh
drifting off during training quite as
much, uh maybe not being as influenced
by the latest data that they're seeing,
but honestly, I'm not quite sure.
>> If you look up close, even the Earth
looks flat. And I think the idea here is
that you're going to constrain things
along that flat plane. And it kind of
reminds me of a of a book I really like,
which is the foundation where um they
don't predict every single event
to set humanity on its course. They
actually design the pathways in which
civilization is likely to move and then
chaos itself can be somewhat
predictable. And I think the same is
happening here where instead of just
throwing random weights in a very high
dimensional space um you let them wander
around a very very well- definfined
plane like take a sphere but that
manifold itself is just of a predictable
it looks like a flat plane. So it kind
of leads to I would say a more
predictable path for chaos if that makes
sense.
>> I think training is really hard. It is
it is as much as we want it to be a
science and it is a science there's a
lot of variation when you're doing that
and anybody who's ever fine-tuned a
model will know this right and I'm I'm a
you know part-time fine-tuner hack in
the evening and the sort of things you
need to think about is what is the
learning rate for this you know how am I
going to put the data in there to get
the best output from my model etc and it
and it's it is hard to get that right
and and actually to the point as you're
sort of trying to mix all this up if you
put the learning rate you know the
learning rate too high and it's
aggressive then you know then that's the
sort of thing what will call cause the
uh the the gradients explode etc. What
we have seen in the past is especially
with these big labs and you'll have
heard it before is that you did a big
training run and then about 3 months in
it goes
and and it was just like and and every
you know you you had something it was
wrong and the training run blew up and
you just lost a few million dollars and
and that still sort of thing happens
today as well. It costs a lot of money
to do these training runs. It costs a
lot if things go off. So actually by
being able to stabilize things in that
way and have things become more
predictable, it means the cost comes
down and it means that we are going to
get better AI in the future and to the
point as they're trying to solve have
things be a little bit more predictable
and deterministic. So that's I think
that's a big save is we're going to get
more AI quicker, better, more reliably.
>> Yeah, absolutely. Yeah, that's kind of
how I thought about it was like, you
know, if you think about these training
runs as kind of these rocket launches
and it's just like it has to be a really
precise thing or else it kind of breaks
at huge cost. It's like kind of what
this isself into space like you were
saying.
>> I think you brought it back to my
analogy, Tim.
>> Yes, exactly. You're welcome. That's
where I was headed. So,
>> all right, last topic of the day I want
to talk about. So there's a great
publication uh called works in progress.
They do a bunch of interesting
investigative reporting research
effectively on technology and how it
happens. Um and this story uh entitled
the algorithm will see you now which
published a few weeks ago author by the
name of Dena Musa uh really caught my
eye. Um and uh this article does a
pretty simple thing. It basically says,
look, a few years ago, there was a lot
of hype that computer vision
technologies were going to replace all
radiologists. The idea was, well, look,
what does a radiologist do? They look at
a scan and they try to find anomalies in
that scan and they label that scan and
uh and that's what they do. And so,
surely with computer vision,
radiologists are going to be the first
job to get replaced in the AI
revolution. And the paper just says,
okay, let's or the article just says
like let's let's take a look at what
happened. And what's interesting and I
I'll just quote really here is that like
things have actually moved not just in a
flatline direction but like in the other
direction for radiologists. So quote
demand for human labor is higher than
ever. Uh in 2025 American diagnostic
radiology residency programs offered a
record,28
positions across all radiology
specialties, a 4% increase from 2024.
and the field's vacancy rates are at an
all-time high. And in 2025, radiology
was the second highest paid medical
specialty in the country with an average
income of $520,000,
over 48% higher than the average salary
in 2015.
So, this is kind of a really interesting
anomaly um and I think violates I think
a lot of our sort of anticipations,
assumptions about what AI is going to do
to the job market. M I see you already
going off of mute so I'll let you just
get the hot take in first here.
>> No, I was just saying wow I'm in the
wrong profession. What was that salary?
>> Yes. 520,000. You should have become a
radiologist. Okay.
>> But like why why is this? So like not
only is the cop going up but demand is
going up even in the same time you know
computer vision models are incredible
now right? Um and so what what do you
think is happening here? I think part of
it is just the human interfaces which is
no matter how good an AI model is going
to be at doing the job of a
cardiologist. They can only work through
the interfaces which are provided to
them. They don't have senses. They can't
speak. They can't interact with the
patient. They can't interact with other
doctors. They can't leverage their
previous expertise with that particular
patient. And unless something has been
either written down or has been built
into an agent where every single input
and output is defined, they're not going
to have the same data as a real human
doctor. And who's going to go write that
data? Are you going to have nurses
running around and saying, "Oh, can you
describe everything? Does this hurt when
I touch you here?" The AI agent is
saying or the model is saying, "Co, can
you please give me a bit more about your
background? Has your mother had similar
issues?" So I think part of it is
establishing the right interfaces and
establishing trust and no matter how you
look at it an AI model is never going to
be able to fill in that role. You can
maybe enhance or give a second opinion
or work together with a cardiologist to
help I would say provide a second
verification layer to their um
assessment. But I don't think we're at a
point where we can say no matter how
good these models can be individually
that they're going to be able to replace
all the human interface things that
these folks are doing.
>> Olivia, do you have any reflections uh
on this piece? I mean, I guess one of
the questions just building off of what
Mihi said was like, okay, well, maybe
the history of this is, you know,
computer vision didn't have this effect,
but even Mihi said it himself is like,
well, we didn't have an agent that would
go ahead and collect all this context
and be this interface. Is this kind of
maybe a temporary phenomenon? Like could
I say, look, as agents get better,
radiologists really are going to be in
trouble.
>> I think that's hard to say. Um
especially because
you're not just talking about diagnostic
accuracy, but you're also talking about
trust and uh whether or not we can
actually put critical decisionmaking in
the hands of machines. And I think you
know as a society we absolutely have not
settled on an answer for that question.
Um classically I think yeah there the
the limitation of uh of machine learning
to solve these problems has been the
bigger issue. Um, I think the first time
that I saw, you know, can we replace
radiologists, a lot of the points that
they made were were basically that the
scans that they were using tended to
have like the same uh they were all from
the same hospital. So, they had certain
things on them that were causing it to
use absolutely the wrong features to
differentiate. And I do think, you know,
in in our in more recently the when you
look at the the way LLMs are doing
things, yes, they're able to get this
kind of diagnostic accuracy a little bit
better. Um, but I think in general,
you're always going to end up doing the
lowhanging fruit with uh machine
learning before you're able to replace
experts. So if we want to build systems
that we have trust in, then those
systems should be doing things that only
they have extremely high confidence in.
And the difficult cases should be left
up for to really, you know, to humans
who have the ability to bring in more
data than you could ever build an
sufficient integrations for for an
agent. Um, I think if we were talking
about a world where yes, we truly have
AGI, then maybe it's a different story.
But I don't think LLMs today, and
granted, this is a personal opinion that
is widely debated in the field. I don't
think LLMs today represent that AGI.
>> I think you're spot on, Olivia, on
today's, but I think there comes a point
when we know the AI is better than the
humans, then we should be handing over
some of that. And I know that sounds
really harsh and it's definitely not an
IBM view. It's a Chris view. We're going
to clarify that right now. But but but
let's let's imagine this for a second,
right? It's let's say you want to play a
game of chess, right, Tim? Who would you
rather play at chess? Would you rather
play Magnus Carlson, Magnus Nean, or
would you rather play Stockfish? What
would you Who would you rather play?
>> I mean, if I had a choice, I guess
Magnus Carlson, right?
>> Great. Future of humanity. We're going
to have a chess game. It's going to be
you versus the alien, some alien has
come from a different planet. And it's
either Magnus Carlson or Stockfish
that's going to take take them on. And
by the way, whoever loses that game, the
the planet is gone. Are you choosing
Magnus or you choosing Stockfish?
>> Uh, I guess Stockfish then, right?
>> Exactly. And then therefore, I think
there comes a point where you're using
AI for entertainment, and that's fine.
or you're using it to for productivity.
But there comes a point if something is
better and and life is is hanging on a
balance there. You should be using the
absolute best tools that you've got your
your disposition at that point to be
able to solve that problem. And and and
I agree with you Olivia completely 100%.
We are not there just now. But there
will come a point where we are there.
And therefore the moral question will
come around to is is actually should we
not be putting that in the AI hands of
the AI versus the human because the AI
is going to get it right more times than
the human is. And I and we're not there
yet but that question is going to be
coming.
>> I I think I fundamentally disagree and I
think it's because machine learning is
fundamentally probabilistic and humans
are not. And so I think there is always
going to be an error rate with machine
learning techniques as they have
currently been developed that we I I I
think it is highly unlikely that we are
ever able to fully account for that. Um,
and I think that there is would have to
be some kind of other advance, some kind
of other technology that would allow the
uh an LLM type technology or a machine
learning type technology to act like a
human and have actual like conscious
decisionm. So I don't think it's a
matter of so of what's the best. Um,
radiology fundamentally is subject to
interpretation to a certain degree. So,
I've recently had to get scans on
various things. And, uh, the thing I've
noticed is that one radiologist will
look at the scan and say, "Hey, I'm
noticing this one thing." The other
radiologist will look at the scan and
say, "Hey, I I see something different."
Um, and one one thing that one of my
doctors pointed out recently, you know,
we were looking at a spinal image who
said, you know, I'm actually not sure
personally whether this is uh T7 or T8
because I don't know which one we're
looking at. And the only way you could
know that is by talking to the original
imagers and say, okay, where was this
person positioned? How can you tell the
difference between these vertebrae? So
maybe maybe you can imagine an agent
being able to do that, but the amount of
data that you have to pull in in order
to make that decision as best as
possible is actually not as trivial as
we think. So the it's actually a huge
engineering challenge to bring
sufficient data to replace radiologists.
>> It's context communication and
accountability.
But we also have to looks at look at
bugs vendors and hackers. So for
example,
one of the health care systems here got
hacked and they took down the whole
system for more than a year where
everything every system had to be
disconnected cuz most of the systems
were still running on Windows XP
and the moment they gotworked they got
crypto locked and all the radiology
equipment
everything had to go through human
doctors. They would be disconnected from
the network. You would go in, you would
do something, you write that down, you'd
take the note, you rip it out, you give
it to the next doctor and so on and so
forth. So even if we want to do this
today, I think it will take more than 50
years 50 to roll it out in a
well-governed way to all the hospitals
in major well-developed countries to the
point where it can be reliable enough.
And if you don't have your radiologist
to fall back to, those systems are going
to continue to remain vulnerable to
hackers, uh, to vendors who are going to
say, "Oops, you know, we're going to
turn off your cardiologist if you don't
pay your your token bills." Um, so I I
kind of see a system where it's not
going to be hybrid, where the AI is
going to provide a review or help with
things like triage or measure objects or
give a second opinion, but the
accountability and the context and
communication will fall into the hands
of the human.
>> Well, we'll have to see. I think we're
going to check in basically if we're
still running at that point fore in a
few years. I do want to revisit this
again and see sort of where we are cuz I
think everybody here has kind of put out
quite different visions of what might
happen in the future and I think it kind
of points to just ultimately how
uncertain some of this all is. Um well
that's all the time that we have for
today. Um Mihi Chris, good to have you
on the show as always. Olivia, hope to
have you back at some point. Thanks for
joining all you listeners. Uh if you
enjoyed what you heard, you can get us
on Apple Podcast, Spotify, and podcast
platforms everywhere. And we'll see you
next week on Mixture of Experts.
[Music]