ChatGPT 5.1: Conversational Style Focus
Key Points
- The community sees the recent GPT‑5 updates as a mixed “fix” that may prioritize cost optimization over genuine improvements in model warmth and performance, especially compared to earlier models like GPT‑4o.
- “Mixture of Experts” introduces a weekly panel of AI thought leaders—including Kautar El Mangroui, Aaron Botman, and Mihai Krivetti—to dissect key developments in artificial intelligence.
- This week’s AI news roundup highlights Anthropic’s $50 billion U.S. data‑center investment, Elon Musk’s AI‑chip fab for autonomous vehicles, Baidu’s new AI chips, and a Tucson restaurant deploying AI‑driven robotic cats for food delivery.
- OpenAI launched two new ChatGPT 5.1 variants—Instant (fast) and Thinking (advanced)—emphasizing a shift from benchmark bragging to improving conversational style and user enjoyment.
- Panelists debate why conversational style now tops performance metrics, suggesting users value engaging, “human‑like” interactions as much as raw intelligence in modern AI systems.
Sections
- Mixture of Experts: AI Model Updates - A panel dives into mixed community reactions to GPT‑5’s recent fixes, previews new model releases like ChatGPT 5.1 and Kimmy K2, and highlights industry headlines such as Anthropic’s $50 billion U.S. data‑center push and Elon Musk’s AI chip fab plans.
- New Model Release or Just Tweaks - The speaker questions whether the latest GPT‑5‑based update is a genuinely new model or merely fine‑tuned guardrails, prompts, and UI changes, noting cost‑vs‑speed trade‑offs and mixed community attitudes toward paying premium for maximum performance.
- IQ vs EQ AI Market Segmentation - The speaker argues that AI model differentiation will revolve around customization, cost‑performance, trust, and user preferences for raw intelligence versus emotional intelligence, creating distinct market segments and specialized offerings.
- Open‑Source AI Milestone Shifts Landscape - The speaker explains that a new open‑source trillion‑parameter MOE model achieves competitive performance and efficiency, challenging proprietary AI dominance and marking a “Linux‑like” shift toward shared, permissively‑licensed ecosystems, particularly driven by China.
- Skepticism Over Open‑Source Model Trust - The speaker questions Kimi K2’s performance and transparency, calls for third‑party benchmarking, and advocates using managed AI services over self‑hosting due to trust, tool integration, and reliability concerns.
- Enterprise AI Agents and Open‑Source Competition - The discussion highlights open‑source models now rivaling frontier AI, raises safety concerns about claims of invoking hundreds of tools, and details Microsoft’s announced plan to roll out autonomous enterprise AI agents with unique identities that can access corporate systems, attend meetings, edit documents, and collaborate with humans and other agents.
- AI Agent Identity and Compliance Risks - A discussion about Microsoft's AI agents becoming user-like entities, raising data integrity, governance, and accountability challenges for CIOs and CISOs.
- Proliferating Autonomous Agents in Workplaces - The speakers predict a future where countless AI agents, governed by zero‑trust frameworks, blur the line between human and machine in the office, creating hybrid collaborations.
Full Transcript
# ChatGPT 5.1: Conversational Style Focus **Source:** [https://www.youtube.com/watch?v=5sFJVAoafFI](https://www.youtube.com/watch?v=5sFJVAoafFI) **Duration:** 00:31:45 ## Summary - The community sees the recent GPT‑5 updates as a mixed “fix” that may prioritize cost optimization over genuine improvements in model warmth and performance, especially compared to earlier models like GPT‑4o. - “Mixture of Experts” introduces a weekly panel of AI thought leaders—including Kautar El Mangroui, Aaron Botman, and Mihai Krivetti—to dissect key developments in artificial intelligence. - This week’s AI news roundup highlights Anthropic’s $50 billion U.S. data‑center investment, Elon Musk’s AI‑chip fab for autonomous vehicles, Baidu’s new AI chips, and a Tucson restaurant deploying AI‑driven robotic cats for food delivery. - OpenAI launched two new ChatGPT 5.1 variants—Instant (fast) and Thinking (advanced)—emphasizing a shift from benchmark bragging to improving conversational style and user enjoyment. - Panelists debate why conversational style now tops performance metrics, suggesting users value engaging, “human‑like” interactions as much as raw intelligence in modern AI systems. ## Sections - [00:00:00](https://www.youtube.com/watch?v=5sFJVAoafFI&t=0s) **Mixture of Experts: AI Model Updates** - A panel dives into mixed community reactions to GPT‑5’s recent fixes, previews new model releases like ChatGPT 5.1 and Kimmy K2, and highlights industry headlines such as Anthropic’s $50 billion U.S. data‑center push and Elon Musk’s AI chip fab plans. - [00:05:09](https://www.youtube.com/watch?v=5sFJVAoafFI&t=309s) **New Model Release or Just Tweaks** - The speaker questions whether the latest GPT‑5‑based update is a genuinely new model or merely fine‑tuned guardrails, prompts, and UI changes, noting cost‑vs‑speed trade‑offs and mixed community attitudes toward paying premium for maximum performance. - [00:08:34](https://www.youtube.com/watch?v=5sFJVAoafFI&t=514s) **IQ vs EQ AI Market Segmentation** - The speaker argues that AI model differentiation will revolve around customization, cost‑performance, trust, and user preferences for raw intelligence versus emotional intelligence, creating distinct market segments and specialized offerings. - [00:13:55](https://www.youtube.com/watch?v=5sFJVAoafFI&t=835s) **Open‑Source AI Milestone Shifts Landscape** - The speaker explains that a new open‑source trillion‑parameter MOE model achieves competitive performance and efficiency, challenging proprietary AI dominance and marking a “Linux‑like” shift toward shared, permissively‑licensed ecosystems, particularly driven by China. - [00:17:36](https://www.youtube.com/watch?v=5sFJVAoafFI&t=1056s) **Skepticism Over Open‑Source Model Trust** - The speaker questions Kimi K2’s performance and transparency, calls for third‑party benchmarking, and advocates using managed AI services over self‑hosting due to trust, tool integration, and reliability concerns. - [00:21:05](https://www.youtube.com/watch?v=5sFJVAoafFI&t=1265s) **Enterprise AI Agents and Open‑Source Competition** - The discussion highlights open‑source models now rivaling frontier AI, raises safety concerns about claims of invoking hundreds of tools, and details Microsoft’s announced plan to roll out autonomous enterprise AI agents with unique identities that can access corporate systems, attend meetings, edit documents, and collaborate with humans and other agents. - [00:24:36](https://www.youtube.com/watch?v=5sFJVAoafFI&t=1476s) **AI Agent Identity and Compliance Risks** - A discussion about Microsoft's AI agents becoming user-like entities, raising data integrity, governance, and accountability challenges for CIOs and CISOs. - [00:27:48](https://www.youtube.com/watch?v=5sFJVAoafFI&t=1668s) **Proliferating Autonomous Agents in Workplaces** - The speakers predict a future where countless AI agents, governed by zero‑trust frameworks, blur the line between human and machine in the office, creating hybrid collaborations. ## Full Transcript
I think it's more like a fix. They've been trying
to fix the issues since they've launched GPT5. I think
the overall community has mixed feelings about it. They're still
attached to the performance they were getting out of models
like GPT4O. Some of the community feels it's more of
a cost optimization as opposed to really an issue with
how warm the model is responding. All that and more
on today's Mixture of Experts. Foreign I'm Tim Huang and
welcome to Mixture of Experts. Each week Moe brings together
a panel of the finest minds in technology to distill
down what's important in artificial intelligence. Joining us today are
three incredible panelists. We've got Kautar El Mangroui, Principal Research
Scientist and Manager, Hybrid Cloud Platform Aaron Botman, IAM Fellow
and Master inventor and Mihai Krivetti, Distinguished Engineer at Genti
AI. Alright, this episode we're going to be covering a
lot of interesting developments in the model space. We'll be
talking about OpenAI's release of ChatGPT 5.1, the incredible performance
we're seeing out of Kimmy K2 thinking. And we're going
to end with a sort of interesting story about Microsoft
and its release of Agentic users. But first we've got
Aili with the news. Hey everyone, I'm Ili McConnen, a
tech news writer for IBM Think. Here are this week's
AI headlines. AI startup Anthropic said on Wednesday it would
invest 50 billion in building data centers in the U.S.
elon Musk plans to build a massive AI chip fabrication
plant to create chips for self driving cars and robots.
Baidu unveiled two artificial intelligence chips as Chinese tech giants
ramp up their chip making efforts. A new Tucson restaurant
is using AI robotic cats to deliver food to customers
tables. For more subscribe to the Think newsletter linked in
the show notes and now let's see what our Experts
think of ChatGPT 5.1. Let's start with ChatGPT 5.1, which
by default is the biggest story of the week. The
big news here is that OpenAI has announced two of
its kind of like latest editions of its model. So
this will be ChatGPT 5.1 Instant, which is their sort
of fast model and ChatGPT 5.1 Thinking, which is their
sort of like advanced technology deluxe model. And actually I
think Aaron, I want to start with you. I think
one of the most interesting things about this is typically
when companies have touted new models in the past, they
have tended to tout the fact that, look, they're so
good at reasoning and they're so good against all these
benchmarks. But the thing that OpenAI leads with, and I'll
quote the blog post, is actually conversational style. So OpenAI
says, quote, we heard clearly from users that great AI
should not only be smart but but also be enjoyable
to talk to. GPT 5.1 improves meaningfully on both intelligence
and communication style. And I guess, Aaron, I'm curious about
what you think is leading to that, right? Are people
just not very impressed by performance on benchmarks anymore? Why
is style such an important part of this launch? Yeah,
style is critical. I think it develops a sense of
empathy with the user and trust so that if the
model can have a more warm type personality and, you
know, respond in a way, then it develops that relationship
further. Which I think we'll talk more a bit about
that later in the podcast. But I want to mention
what I really like about GPT 5.1 is this router
mechanism that whenever you're speaking with it or having a
conversation with it, with the style of which they have
infused into this model, it goes into one of the
variants that it has. It can go into an instant
or thinking type variant, which is great, right? Because if
I want to have a very quick instantaneous response with
low response time, if that's the use case, then I
more than likely can also coerce that router to go
into that particular variant. If I need it to go
into a deeper chain of thought, then it can go
that way too. But then it joins back up in
the middle and that's where that stylistic choice comes in
to help develop, you know, that said relationship with the
user. So it becomes more fluent. Yeah, that's great. And
I did want to talk a little bit more about
that. I mean, Mihaly, the question, kind of cheeky question
I was going to ask is like, should we be
covering this at all? You know, the movement from 5
to 5.1 is like maybe a little bit incremental. How
big of a deal is this launch, you think? I
think it's more like a fix. They've been trying to
fix the issues since they've launched GPT5. I think the
overall community has mixed feelings about it. There are still
attached to the performance they were getting out of models
like GPT4O and some of the community feels it's more
of a cost optimization as opposed to really an issue
with how warm the model is responding. Like even I
feel like, am I getting gaslit here? Is it like,
you know, it's not that the model is bad, you
see, these results are great. You don't like the results
because it's so direct. Can you help me with this?
No. So I was always wondering, is this actually a
new model release or have they just fine tuned the
model or did some slight changes to the guardrails, or
did some slight changes to the prompts, or the way
they're, you know, exposing The UI and APIs and all
these other kind of things. It does look like they've
actually trained a new model, or at least iterated on
the same GPT5 family. So there are definitely some changes
there. But to me, this still feels like they're trying
to address the issues with the GPT 4.0 transition. There
are some cost optimization challenges where instant obviously provides much
faster responses and this router provides cost efficiency. But there
are mixed feelings within the communities. Like, for example, when
I use one of these models, just turn it up
to 11. Just give me the good results. I don't
care about. I don't care. Just think as much as
possible, Think. As much as possible and give me something
that works. So that's going to get expensive fast. Yeah,
that's right. I have a friend who is just like,
I just love the idea that when I push the
button, it's working as hard as possible for me, even
if the task is like, very, very simple. So, like,
that psychology, I think, is very fun. I'm paying, what,
240 Euro, whatever with tax for the GPT 4 Pro?
I'm sorry, GPT? Take your money's worth. I'm going to
get my money's worth. Kautzer, what did you think? Have
you played with this model yet? What's your vibe? Check
on 5.1. Yeah, I played the Woodpecker with it a
little bit. I think I agree what my colleagues here
are saying, but I feel they're trying to find a
differentiation through the user experience and the empathy and the
customization, especially in a world where raw intelligence is becoming
a commodity, thanks to models like Kimmy K2. So they're
trying to focus on maybe the fluid conversation, the daily
use. One of the features that I liked is the
adaptive reasoning. So deciding when to think before reasoning, before
responding basically to these complex questions, which seems to lead
to better accuracy than the previous fast models, while trying
to remain quick on these simple tasks. But also the
tone that they're saying, designed to be warmer or even
playful, reflecting basically this strategic choice to improve the conversational
feel. So I don't know, are we seeing here segmentation
of the markets, models that are focused on efficiency, like
what we're seeing in the Open source with Kimik 2
or models that are trying to win the user experience,
the personality. So it's interesting to see here. Yeah, that's
right. And I think I did want to pick up
a little bit before we move on to talking about
Kimike2 thinking on this point about customization. I think it's
very, very interesting that they kind of really sell the
point, like, oh, we are trying to make these models
more customizable for you as a user, which is a
little bit different, particularly in an enterprise business case. It's
not like Microsoft Word is like, oh, well, Microsoft Word
is going to be customized for you specifically. But in
AI, it does definitely feel like we're headed towards a
world where everybody's experience of, say, ChatGPT is going to
feel pretty different over time as they allow for more
and more customization and I guess, Kalto, how do you
feel about that? Do you think that's going to be
just where the market's going on, some of this stuff?
Yeah, I think so. Definitely customization is going to be
an important piece of it. And whether there's also the
cost per performance, the user experience, all of these things
kind of will segment the market here. Which models are
we going to go to? But I think also trust
and governance and compliance, those also will be very important.
So it's going to be interesting to see how these
things evolve. But definitely there is a war here. Is
it the IQ war or the EQ war? So is
it the, you know, intelligence or the emotional cuic quotients
here? So are we going to segment along those two
dimensions? Yeah, we'll see. I don't know. I think it'd
be very funny if it just turns out that there's
going to be kind of a battle for right brain
users who just want the model to be as smart
as possible. And then a battle for left brain. I
got it flipped. Right. And there'll be a battle for,
I think some users who just want like much more
natural conversation. And that's actually how the market will sort
of divide over time. Ideally you'd want a model that
does both really, really well. But it seems like the
companies are trying to specialize a little bit over time.
Yeah, I mean, I could certainly see a reality where
you could bring your own style or bring your own
behavior through like an Alora weight or something like that.
Right. So you bring your own adapter and then you
can upload it, plug it in, or even mix together
which would be pretty interesting. And this might be somewhat
aware of this warmer tone, more conversational piece is going
and I wonder if GPT 5.2 or GPT 6 might
push the ecosystem a bit further down towards that way.
I do want to say this is raising all sorts
of red flags with me. I'm definitely in the camp
of technical person that the only smart device in my
home is a printer and I keep a gun next
to it in case it makes a funny noise. I
have an inherent distrust of any system that learns about
me, learns about my behavior, adapts over time because the
only thing I'm seeing in my mind is like Advertisement
influencing my decisions, learning and optimizing its responses to drive
my behavior. So I actually like simple systems. I like
my AI like I like my headphones with a wire
with switches I can toggle on or off. I want
to be the one in control. I don't want the
router, I don't want the memory, I don't want it
to learn about me. I want to tell it what
it needs to know every time. So I think there's
a balance there to strike. Yeah, yeah. You're actually even
against the router. So I guess like the thing that
Aaron finds so interesting, you're kind of like, I don't
want it to tell you, to decide how much to
think. Yeah, this is something happening whether we want it
or not. Unfortunately, these customizations and adaptation and so on.
It is the more we use these systems, the more
they're learning about our behaviors and so on. And I
think whether we have control or not, I don't know
these systems, they're not giving us control. And that's something
maybe another design point. Can these AI systems give the
users control whether they want to learn about our behavior,
we want them to learn about us and things like
that and adapt, or we want maybe just simpler interactions
and kind of robotics ones without any kind of implied
intelligence. Yeah, exactly. We don't want it to be too
smart in a certain way. I'm going to move us
on to our next topic. Another model I think to
cover, which I think is a really interesting counterpoint to
the ChatGPT 5.1 story, is the hype around Kimik 2
thinking. So just to kind of quickly review, Kimik 2
is a model produced by a Chinese AI startup called
Moonshot AI and they dropped this model, which is an
open source model. Which incredibly, has been able to claim
superior performance against even proprietary models on a set of
pretty big benchmarks. So on humanity's last exam, they're doing
great. On Browse Comp, they're doing great. On Sui bench,
they're doing great. And this is a pretty, I think,
interesting story, right? Which is, I think for a long
time on moe, you've talked about when will open source
triumph over the proprietary models? And this seems to be
a case where the open source model is doing at
or better than all the proprietary models. And I guess,
Michal, I'll give you a chance to kind of lay
out your conspiracy theory, because before the episode you were
saying, you know, maybe the timing of ChatGPT 5.1 is
a little. Suspicious with K2 thinking. Do you want to,
do you want to just quickly lay that out? Yeah,
let's quickly increment that one. Because something big is coming
in the open source space, and that's my theory there,
that there's definitely a response in the market to this
very, very powerful open source model. Yeah, and I think
that's like, I mean, that's maybe the cynical view on
sort of like, oh, you're going to tell all these
style and emotional communication things because now you're getting beat
on all the benchmarks, I guess. Katra, maybe let me
take a step back though, is like, are benchmarks still
a useful way of even looking at this? Right. So
obviously the companies care a lot about it, but I
think on the show we have talked about like, well,
are we kind of reaching the end of the usefulness
of some of these benchmarks in terms of showing performance?
Obviously this is a milestone for open source, but does
it really say that open source is now better than
proprietary models? How do you read these results? I think
if we look at the results and the benchmarks, I
mean, it's saying something. So there must be a way
to evaluate these models. And right now the only way
that's kind of viable is benchmarking, trying these things and
also on these standard benchmarks. So what this is saying
is this is actually a big open source milestone. It
is a challenge to the entire closed AI economy. So
if the best model in the world is open weights,
the center of gravity in AI shifts from secret models
to shared ecosystems. And so this is also, this move
is positioned in China also as a serious contender in
this global open model race, which is paralleling kind of
the Linux moment in the AI era. So there, I
think the results are showing really superior performance using the
MOE architecture. With 1 trillion parameters with only activating 32
billion per inference per token. So there is a big
focus here on compute efficiency as much as also the
capability one. And another thing is the license that they
have is very permissive but also strategic. So they're saying
if you're doing this at a massive scale, you have
to mention Kimik too. I think the rule is, I
have written down here is 100 million monthly active users
or 20 million USD per month in revenue. Yes, yes.
So it's like anything that is related to massive scale.
Give us the attribution. And so for enterprises, this open
way dominance really means that you can finally bring top
tier reasoning in house with much lower costs. And this
is kind of could be the start of this open
apex era where proprietary adventures is decaying faster than ever.
So I think for big tech, the open outperformance like
K2 thinking they're forcing, the question is how do you
justify like a 20 million monthly burn rate when community
models can really close the gap. And I think here
the next frontier is may not be maybe the model
quality, but also the integration who builds the most trusted
compliant and secure deployment pipelines. So I think that's going
to become super important here. So it's not just about
building the most intelligent and most powerful model, but also
about building a model that you can integrate and trust
and it is compliant and you have these secure deployment
pipelines. Aaron, do you want to make a forecast for
where we're going from here? I mean, I think the
old rule used to be, well, open source is going
to lag behind, quote, state of the art by X
months. Now we're in a world where if you buy
Lehigh's theory, it's basically like now open source is ahead
in some ways of the proprietary models. Are we in
for a long period of kind of rough parity, I
think between open source and proprietary or do you feel
like actually over time open source is now going to
even accelerate further? We'll start talking about like, oh well,
how long is it going to take for OpenAI to
catch up relative to open source? Curious about how you
think about that. Yeah. So real quick, before I answer
that, I wanted to just pull the thread a bit
on what about these standards and benchmarks, marks, you know,
around this? I did, I want to make the point
that I think a third party independent assessment needs to
be made right around this model because I was looking
around and we can make stats tell us anything. Right.
You know, you know I can, I can say, hey,
if I Walk outside and it's zero degrees, you know,
Celsius for example, that it's healthy. Right. So I mean
I'm not sure how much I would really buy, you
know, lots of the performance that Kimmy did on like
the browse comp, the SWE bench, the live code bench,
you know, those pieces and elements until a third party
test this model. Right. So that's a one point. Right.
The other is that I don't think this is an
open source when proprietary is finished argument. I do think
we're in a clear inflection point where open source models
can compete and complete some of the highest levels of
reasoning tasks. But I always like to think this, you
know, Kimi K2 isn't just thinking, it's overthinking for all
of us. Right. So that being said, you know, there's
no, no free lunch. Right. We need to pick the
right tool for the problem that we're going to use
and put them together. Right. And I think one of
the biggest areas that Kimi has an issue with a
is trust and transparency even though it is open source.
Right. Again, I want to see a third party show
us what the standards are and what the benchmarks are
like within the real world here and then also the
ecosystem. I used to host my own models. Very difficult
to do. I like to go and use like a
Watson X for example or a Bedrock that host models
for me so I can use the tooling that's available.
And I think for example, we just talked about GPT
5.5.1. I think the tooling and that part is very
much a hedge. So I think there's pros and cons
for all of these and we're moving into a world
where we're going to ensemble these types of models together
within these very large graphs that are conversational and then
they use these 808 protocols to communicate together. I guess
maybe a final question to Mihaly before we move on
to the last topic I want to cover was I
know the reaction you just had to 5.1 was it's
learning about me, it's deciding how much to think. I
just want it to be simple. Right. I just don't
want that much. And is it ultimately, I guess a
would a proprietary model, I guess. Miha, the question just
to kind of get it to a question is like
how common do you think you are? Right. Do you
feel like the average consumer wants this Level of control?
No. Okay. I don't think they're necessarily aware of it,
but as a developer or somebody who builds AI agents,
who builds AI tools, I'm looking at Kimik2 and I
think, look, 300 sequential tool calls, 256k of context, 10
times cheaper than GPT5. And I can run it there,
okay, at one token per second with a distilled model.
And it's a 1 TB download. And my wife is
going to kill me when I start the servers, but
I can run it, I can run it myself. It
goes through no API. Nobody's looking at my data, nobody's
putting a router in front of it. Nobody gets to
make those choices for me. And it was kind of
funny when I was flying out last week. I had
GPT OSS 20 billion on my laptop and I was
able to do coding like that. Level of control. The
ability to run these models locally I think is priceless.
And the fact that we now have models within the
open source space that can compete, genuinely compete with some
of the frontier models, I think it's just awesome. Yeah.
One of the biggest claims was that he could call
200 to 300 tools. Right. That's a big claim. And
this long horizon reasoning, I would really like to see
that validated. And you're calling external tools, that's a safety
issue as well. So that's something to be very cognizant
about. All right, last story of the day that I
want to cover is kind of this fun story. The
Register, the kind of tech news site, basically reported on
this. Kind of interesting, not really a leak, but some
sort of teasing that Microsoft is doing about what it's
working on in terms of agents for the enterprise. And
specifically it's releasing or planning to release what they're calling,
quote, a new class of AI agents that operate as
independent users within the enterprise workforce. So the quote here,
which I think is just fun to read, is like,
each embodied agent has its own identity, dedicated access to
organizational systems and applications, and the ability to collaborate with
humans and other agents. These agents can attend meetings, edit
documents, communicate via email and chat, and perform tasks autonomously.
So this is a kind of fun dream. We've been
talking about AI agents all year, but this is maybe
like the first one where a company's starting to make
the claim, like, oh yeah, we're going to just have
like a drag and drop coworker who will be an
agent that will operate on the enterprise exactly the same
way as any other user does. And So I guess,
Mihaly, I see you making a face, so maybe I'll
call on you first. Particularly someone who works on agents
all the time. Is this marketing hype? Is this a
good idea? Is this a bad idea? I comment on
printers. Just a printer on a gun next to it
in case it makes funny noises. As somebody who's actively
building security software for AI agents, I'm building contextforge, which
is a gateway for agents and MCP servers. This is
great news. I mean, I'm sure there's going to be
hundreds of clients interested in how do we secure authentication,
authorization, governance, how do we ensure PII data doesn't leak.
But from an enterprise perspective, it's like this can be
a security nightmare. Not only do you have to manage
the user's identity, now you have potentially hundreds or thousands
of agents who are moving data left and right with
no accounting for governance compliance, the UA act, gdpr, not
filtering necessarily your PII data, with no clear way to
do evaluations or to evaluate their performance. And it won't
be long before you know your boss is now Cortana.
You're reporting into an AI agent. Your reports are AI
agents. You're bringing your agents into the conversation. It's not
necessarily something new. So even on GitHub today, I can
trigger GitHub Copilot to help review my PRs. There are
already agents on Microsoft Teams. I'm building one of these
agents for Office365 that calls our Consulting Advantage platform. So
these things kind of already exist in many organizations. The
part I don't necessarily feel comfortable with is what level
of control do organizations have when all these hundreds of
agents start popping around in various catalogs, in teams, in
all the product suites and CIO offices are running around?
How do I turn this off? How do I turn
this off? How do I ensure data integrity? So I
think this is inevitable. The only thing that feels off
about this story is how much choice is this, like
a new product that we can buy? Or is this
something that's just gonna happen to us from now on?
There's gonna be that cute dog from Windows XP bouncing
around and saying, I'm. Here to help you search for
your Kaltar. Maybe good for business teams, but a nightmare
for CIOs and CISOs. But isn't Microsoft? I mean, Microsoft
is the world leader at this kind of compliance. Why
are they pushing down this route if it comes with
all of the kind of crazy security risks that Mia
is pointing out? Yeah, it's really interesting to See this
new direction that Microsoft is pushing forward. You know, this
is, you know, we're seeing here the shift from having
a tool, AI is a tool to AI as a
teammate. And I agree, you know, here with Matmihai said
this can be like a compliance nightmare here. So because,
you know, governance and auditing. So if it's an agentic
user with its own identity and if it violates a
compliance policy, who is accountable? Is it the admin who
created it, the human who trained it? The organization really
needs a unified oddity log that can differentiate between human
and agent actions. And so I think there are a
lot of interesting implications here. This is pretty disruptive with
these agentic users because they're really full fledged user objects.
They have email identity and so on. Yeah, they can
do everything. Right now. Yes, currently they might be augmenting
us, but maybe at some point might be replacing also
layers in the organization. So I think there are some
interesting pieces to this, but there are also scary pieces
to this that we have to be watching for. And
I think the discussion here is this is organizational and
legal implications of giving AI kind of a corporate identity.
And this is really profound shift from the tooling to
teammates and co workers and what the implications on this,
you know, for hr, for governance, for auditing. And there's
also maybe a cultural shock to this. You know, how
will human employees react to teammates that doesn't need breaks,
works 247 and potentially accesses all their files. And so
this is a major change management challenge that I think
HR departments are not ready for. Yeah, and I did
want to take, I want to end the episode by
taking maybe like a little bit of a step into
the future. Right. Because Aaron, where my head goes on
this is like if you've ever worked, you know, if
you work at a big company, there are people that
you work with that you maybe never meet in person.
You largely experience through Slack. Like I just have this
vision that in the future you're like, you know, someone's
trying to start an office romance only to discover that,
you know, the person they've been working with is actually
an agent that may very well be in our future.
The minute you instantiate user agents that operate like this
in the office. I don't know. Do you have any
wild predictions, Aaron, for the future of the Office place
cook breakfast. That'd be fantastic. Right? Yeah. So I did
look around at what Microsoft's Strategy looked to be and
it looks like this somewhat started with a paper that
they had called the Agentic Economy. But then I think
as the field noticed that there's a lot of security
pieces around this. So then they had groups working on
zero trust agents securing and governing autonomous agents with Microsoft
security. So they're attacking I think both sides to try
to create this agentic architecture with eval frameworks and governance
to help people feel more comfortable. But I could see
a future where there's what, over 8 billion people on
the planet where there's more agents than there are people
on the planet. Well, if you count copilot as an
agent where the population of agents operating society is actually
quite large now. Right? That's right. You can copy paste
agents, you can clone agents. Agents can create potentially other
agents. So this fractal piece goes on and on. But
I do think the blurring between what is a human,
what is an agent is going to become very, very
difficult. I think there's going to be hybrid humans and
agents working together where even if you are talking with
a human, are you really talking to the intent of
the human or is a human simply almost like a
puppet of the agent feeding it what to say? So
that trust and transparency about this blending of these biomimetic,
you know, pieces all working together might be hard to,
to parse but I think it's something that we need
to get ready for. Right. And even prepare our kids,
you know, you know, what, what they could be in
for. Yeah. How do you deal with it when your
boss is a real jerk but he's Cortana, you know.
That's right. That's right. Yeah, it's. Yeah, it's going to
be. And, and the other part too is that these
agents are going to manifest itself within the, within the
physical world. So the way that they change the physical
world are going to give us other confusing signals is
what changed the color of my TV or what changed
the channel or what changed the color of this light
for example, these small things. So it's going to be
a really interesting. The effect of computing in essence I
think is going to be combined with agentic agents. But
yeah, for sure it'll be fun, concerning and interesting all
of it together. All at the same time. Mihaly, do
you want to have the. Final word here on the
episode with consumption based pricing? So how long before your
agents start to attend every meeting or email everyone and
they're going to charge you 0.01 cent for every interaction
and maybe even impersonate humans. Like you've seen that YouTube
the UAI act and start ramping up some of those
fines. Yeah. What I was also thinking on the agentic
economy is that agents will pay us. Right. That we
become businesses. Right. And then we take payment from agents
to get access to a certain element. That would be
nice. The agent has decided it's more cost effective to
delegate to a human. Yeah, exactly. It's a dark and
dangerous future. All right, well that's a great note to
end on and that's all the time that we have
for today. So Kautar Aaron Mihaly, always great to have
you on the show and thanks for joining all you
listeners. If you enjoyed what you heard, you can get
us on Apple Podcasts, Spotify and podcast platforms everywhere. And
we'll see you next week on Mixture of Experts.