Copilot vs Clippy: Agent Battle
Key Points
- Vyoma Gagyar argues Microsoft Copilot is a sophisticated code‑translation and coordination tool, not a revival of the outdated “Clippy” assistant.
- Volkmar Uhlig notes the industry is in a “training‑wheel” phase where AI agents act as copilots under human supervision, but will eventually evolve into fully autonomous pilots.
- The imminent “agent jungle” sees major players like Microsoft and Salesforce deploying competing enterprise‑agent platforms, sparking a 2025‑era battle for market dominance.
- Current experimentation focuses on user‑interface designs for these copilots, as developers test how much human oversight is needed before moving toward screen‑less, fully autonomous systems.
Sections
- Copilot Not Just Clippy - In a podcast intro, AI experts discuss why Microsoft’s Copilot differs from the nostalgic Clippy, framing it within the emerging 2025 rivalry between enterprise AI agent platforms from Microsoft, Salesforce, and others.
- Experimenting with AI Agent Interfaces - The speakers compare today’s AI assistants to early tools like Clippy, noting that firms are still in a “training‑wheel” phase, experimenting with user interfaces while leveraging extensive legacy interaction data.
- IBM’s Green Tech Outlook - The speaker discusses IBM’s commitment to minimizing the environmental footprint of AI and data centers, noting current energy use and projected increases as high‑power GPUs become more common.
- Sustainable AI Powered by Nuclear Energy - In 2024, major tech firms and clients are increasingly prioritizing greener AI compute by leveraging nuclear power and government tax incentives to reduce emissions, avoid pipeline interruptions, and improve efficiency.
- Inference Market as Commodity - The speaker explains that AI inference functions as a token‑priced, perfectly competitive commodity market, driving a race‑to‑the‑bottom across models, hardware, and power generation, which spurs broad innovation and involves billions of dollars in economic stakes.
- AI‑Driven Interface Innovation Discussion - The speakers debate a new AI‑powered computer-use model’s promise for training, enablement, and disability support, noting the ironic shift from human‑designed GUIs to machines now steering the very interfaces they created.
- AI for Software QA Automation - The speaker proposes leveraging the demo technology to automate quality‑control and debugging tasks in software development, such as visual UI checks and cross‑browser validation, rather than focusing on generic machine‑to‑machine interactions.
- Bridging Legacy Systems with AI - The speaker explains how to introduce AI to tech‑focused clients still using legacy infrastructure by leveraging retired experts’ knowledge and historical logs to enrich new software, demonstrating value and easing the transition.
- Why Text Watermarking Matters - The speaker argues that text watermarking is essential for establishing ethical standards, building client confidence, and enabling future regulation despite industry concerns about user adoption and detection.
- Tool Tagging and Societal Split - The speaker argues that tagging every use of amplified tools—like large language models—creates a divided society between users and non‑users, while regulators seek to protect the latter despite the inevitability of technological adoption.
- Cautious Optimism on AI Adoption - The speaker argues that AI is still in its infancy, urging continued experimentation before widespread comfort and regulation can be achieved, much like earlier technological revolutions.
Full Transcript
# Copilot vs Clippy: Agent Battle **Source:** [https://www.youtube.com/watch?v=HYHgJkWnPdQ](https://www.youtube.com/watch?v=HYHgJkWnPdQ) **Duration:** 00:32:49 ## Summary - Vyoma Gagyar argues Microsoft Copilot is a sophisticated code‑translation and coordination tool, not a revival of the outdated “Clippy” assistant. - Volkmar Uhlig notes the industry is in a “training‑wheel” phase where AI agents act as copilots under human supervision, but will eventually evolve into fully autonomous pilots. - The imminent “agent jungle” sees major players like Microsoft and Salesforce deploying competing enterprise‑agent platforms, sparking a 2025‑era battle for market dominance. - Current experimentation focuses on user‑interface designs for these copilots, as developers test how much human oversight is needed before moving toward screen‑less, fully autonomous systems. ## Sections - [00:00:00](https://www.youtube.com/watch?v=HYHgJkWnPdQ&t=0s) **Copilot Not Just Clippy** - In a podcast intro, AI experts discuss why Microsoft’s Copilot differs from the nostalgic Clippy, framing it within the emerging 2025 rivalry between enterprise AI agent platforms from Microsoft, Salesforce, and others. - [00:03:03](https://www.youtube.com/watch?v=HYHgJkWnPdQ&t=183s) **Experimenting with AI Agent Interfaces** - The speakers compare today’s AI assistants to early tools like Clippy, noting that firms are still in a “training‑wheel” phase, experimenting with user interfaces while leveraging extensive legacy interaction data. - [00:06:07](https://www.youtube.com/watch?v=HYHgJkWnPdQ&t=367s) **IBM’s Green Tech Outlook** - The speaker discusses IBM’s commitment to minimizing the environmental footprint of AI and data centers, noting current energy use and projected increases as high‑power GPUs become more common. - [00:09:12](https://www.youtube.com/watch?v=HYHgJkWnPdQ&t=552s) **Sustainable AI Powered by Nuclear Energy** - In 2024, major tech firms and clients are increasingly prioritizing greener AI compute by leveraging nuclear power and government tax incentives to reduce emissions, avoid pipeline interruptions, and improve efficiency. - [00:12:21](https://www.youtube.com/watch?v=HYHgJkWnPdQ&t=741s) **Inference Market as Commodity** - The speaker explains that AI inference functions as a token‑priced, perfectly competitive commodity market, driving a race‑to‑the‑bottom across models, hardware, and power generation, which spurs broad innovation and involves billions of dollars in economic stakes. - [00:15:23](https://www.youtube.com/watch?v=HYHgJkWnPdQ&t=923s) **AI‑Driven Interface Innovation Discussion** - The speakers debate a new AI‑powered computer-use model’s promise for training, enablement, and disability support, noting the ironic shift from human‑designed GUIs to machines now steering the very interfaces they created. - [00:18:26](https://www.youtube.com/watch?v=HYHgJkWnPdQ&t=1106s) **AI for Software QA Automation** - The speaker proposes leveraging the demo technology to automate quality‑control and debugging tasks in software development, such as visual UI checks and cross‑browser validation, rather than focusing on generic machine‑to‑machine interactions. - [00:21:31](https://www.youtube.com/watch?v=HYHgJkWnPdQ&t=1291s) **Bridging Legacy Systems with AI** - The speaker explains how to introduce AI to tech‑focused clients still using legacy infrastructure by leveraging retired experts’ knowledge and historical logs to enrich new software, demonstrating value and easing the transition. - [00:24:35](https://www.youtube.com/watch?v=HYHgJkWnPdQ&t=1475s) **Why Text Watermarking Matters** - The speaker argues that text watermarking is essential for establishing ethical standards, building client confidence, and enabling future regulation despite industry concerns about user adoption and detection. - [00:27:38](https://www.youtube.com/watch?v=HYHgJkWnPdQ&t=1658s) **Tool Tagging and Societal Split** - The speaker argues that tagging every use of amplified tools—like large language models—creates a divided society between users and non‑users, while regulators seek to protect the latter despite the inevitability of technological adoption. - [00:30:43](https://www.youtube.com/watch?v=HYHgJkWnPdQ&t=1843s) **Cautious Optimism on AI Adoption** - The speaker argues that AI is still in its infancy, urging continued experimentation before widespread comfort and regulation can be achieved, much like earlier technological revolutions. ## Full Transcript
Is Microsoft Copilot just like Clippy 2.0?
Vyoma Gagyar is an AI
technical solution architect.
Vyoma, welcome to the show for the first time.
Tell us what you think.
Thank you.
I do not think that it is Clippy 2.0. Microsoft Copilot has been one of
the, uh, pioneers in the field of code
translation, extraction, coordination,
Volkmar Uhlig is Vice President,
AI Infrastructure Portfolio Lead.
Volkmar, welcome to the show.
Uh, what do you think?
I think the judgment is out.
I'll wait for 2.5.
All that and more on today's Mixture of Experts.
I'm Tim Hwang and welcome to Mixture of Experts.
Every week, we're going to bring you the
world class analysis, debate, and thinking
you need to navigate through the rapidly
changing universe of artificial intelligence.
We've got a discussion about nuclear
power, AI using computers, but first
we really want to talk about the
rumble happening in the agent jungle.
Um, the question is, uh, co
pilot just like Clippy 2.0 was inspired by a spicy
tweet from Mark Benioff.
Um, but I think more generally we want
to focus here on a mixture of experts
on kind of taking a look back over the
last few months and the fact that, um,
Salesforce has launched an agent platform.
Microsoft has launched an agent platform.
Really kind of 2025 is shaping up to be
like a battle of competition over agents
and specifically agents in their enterprise.
And so I really want to spend a little bit
of time talking about that and giving all
you listeners out there an intuition of what
to expect over the next 12 months or so.
And maybe Volkmar, I'll turn to you first.
You know, I think what's most interesting
is that, you know, now there's going
to be so many different agents.
platforms to choose from.
Um, do you see different companies taking
different approaches to kind of offering
these technologies to the enterprise?
What do you think are like the
big kind of, you know, competitive
dynamics that are playing out here?
All companies are trying to experiment.
Um, and we are, uh, in a, in a world where we
are slowly moving from, you know, The training
wheels on where the sys, where the systems, uh,
get supervised by humans, uh, then, and right
now the, the system is in the passenger seat.
That's why it's the co-pilot and not the pilot.
Um, then at some point I think there will be
a switch over that the system is, are more
powerful, more trustworthy, and then the system
becomes the pilot and the the users to co-pilot.
And at some point we can get the
Copilot out of the, out of the seat and
the systems can be fully autonomous.
So I think we, we are in a progression of.
of how the technology is evolving.
But I think at this point in time,
human eyes are required on the systems.
And I think the big experimentation right
now is how these user interfaces look like.
We kind of know how the fully
autonomous systems look like.
You know, there's not even a screen.
Or, you know, in cars there's no
driving, no steering wheel anymore.
But in the systems today, we are experimenting.
If you look at Microsoft, they
integrated it sometimes as a chat agent.
Uh, on the side, sometimes
directly in, in the applications.
Um, Apple took a different approach.
Salesforce is taking different approaches.
So everybody's, is experimenting with the
user experience at this point in time.
But, you know, technology
is not, you know, the, the.
Training wheels are still on.
And so we are going through
the training wheel phase.
Yeah, for sure.
And I think it's so interesting is like how
much some of the competition is just happening
on the level of like the interface, right?
It's just like we don't even know how to
effectively interact with these agents.
Um, I think you bring another angle to this
question that I think is worth touching upon
though, because You know, in some ways, right?
Like I think for the kind of, you
know, outside observer, they take
a look at some of this stuff.
And I think they occasionally
are like, this is just Clippy 2.0, right?
Like this is back in like
the 90s or early 2000s.
And you know, we're just talking to a
paperclip on a word processor asking
me whether or not I'm writing a letter.
But it kind of sounds like one reason you
think that this is a genuinely different
thing, like what's happening in this market,
is also that there's a lot of experimentation
happening under the hood as well.
Is that right?
That is correct.
And all the information, the legacy information
that has been gathered from Clippy, you see
that Microsoft has been a great company,
which has been operating seamlessly for years.
Imagine the amount of data that it has gathered.
The Clippy data, as everyone's claiming that
to be, there is so much other information
around, um, the other platforms such as
GitHub or anything, et cetera, as well.
Um, Imagine feeding all of that information
into a large language model and making
your day to day life much better.
So I feel that is what we are aiming at.
There are just a couple of, uh,
problems or a couple of solutions
that we want to get from this.
And first being enhanced productivity.
And I think Microsoft Copilot helps you do that.
And it also gives us a lot of our free time back
to do something more productive and creative.
Yeah, that's great.
And I do think that, like, You know,
particularly Volkmar, I know your background
is working on autonomous vehicles, like
this kind of model that like, the agents are
sort of the less, next level of autonomy,
but we're sort of getting people like to
trust the technology enough to be able
to take it to the next autonomous level.
I think it's like a really interesting
set of problems that we'll see,
uh, kind of play out in the space.
The nice thing here is your
life doesn't depend on it.
Yeah, that's right.
All that will happen if this technology
fails is, you know, the code breaks, or
you send a really awkward email to someone.
So the stakes are a little bit lower.
Well, perfect.
I think one of the topics I really wanted to
touch on, uh, moving on to the next segment.
Um, is on the topic of AI and energy.
A few weeks ago, the news kind of leaked out
that Microsoft was considering restarting
the Three Mile Island nuclear power plant.
And, you know, all the current
projections suggest that future models
are going to need, you know, gigawatts
of power in a data center to run.
Um, and I think we've danced around this topic
in previous episodes of Mixture of Experts.
Um, but I think I wanted to kind of
just tackle it head on is, How are we
thinking about dealing with kind of the
environmental impact of these models and
how much energy is going to be required
to kind of unlock all of their potential?
You know, someone who's like very excited
about the technology but also kind of
concerned about climate change, you
know, it's a topic that I think is like
really sort of near and dear to my heart.
And, and I am really sort of interested
in, you know, the approaches that people
are thinking about and, and trying.
Um, I guess maybe Volkmar, do you want
to, I'll start with you, I'm curious about
like how IBM is thinking about it, but in
general how you're seeing the space kind
of evolve around this this tricky problem.
So I, in general at IBM, we are trying
to look at being, you know, leaving
a green thumb fingerprint on the
planet when we are looking at tech.
So you're trying to be conscious, you know,
there, there is an environmental impact.
The power consumption right now
for data centers stands about 1.5 percent of total power
production in the United States.
So it's, it's tiny, right?
So, and, um, and then with the expected
growth in AI and the projections
are kind of not really friendly.
Uh, assuming that, you know, H100s with,
you know, seven, eight, nine hundred watts.
And then the next ones AMD is
producing, which is like 2000 watts.
I think we have not yet done the
projections of technological improvements.
And so I do not believe that we will,
we will see these high powered cards.
In the long run, I think
it's just a moment in time.
But even if we stay on that projection,
then the total power consumption we are
going to have is an increase from 1.5 % to 4%.
Okay?
Well, take the population growth
of the United States right now.
Um, that's, that's, That's nothing, right?
So it's just the population growth is
already bigger than what we are adding here
in total data center power consumption.
So I think that the moment right
now is that there is a concentrated
interest in very rapid build out.
And we are actually putting the discussion
about what constitutes green energy
and efficient energy back on the table.
And I do not think that has anything to do
with AI, but it's actually a key moment
of a tipping point where we can actually
have a conversation about nuclear power in
the United States, and I'm really excited
about that because I, you know, this is one
of the cleanest power sources and actually
looking at it from tech companies trying to
put on, you know, nuclear power and then, uh,
actually doing that in a,
Uh, careful, you know,
orchestrated way is a good thing.
And, you know, if the conclusion is then,
oh, we should still not do it, then,
you know, that's a, that's a consensus
between the people who have their,
these power plants in their backyard.
But I think the discussion needs to be
in a, in a rational way, and I think
over the last 50 years it was irrational.
Yeah, for sure.
And I think that'll be the
most interesting thing.
I mean, like, so often happens in AI.
It's almost like the, the AI isn't the
thing, but it is triggering the bigger
discussion, which I think is fascinating.
I guess, Vyoma, you work a lot
with customers and clients.
Is, is the environmental
discussion kind of popping up?
Like, are clients raising it?
Or, you know, people looking for solutions
on the kinds of solutions that you work on
saying, I want you to deliver this, but we
have to make sure that the emissions are,
are good, you know, on, on what you deliver?
I'm just curious about what you're
seeing kind of on the front lines there.
Yeah, of course.
Yeah, that's a good question.
2023, we were just getting
up with this technology.
People wanted to know more about it.
But in 2024, now we see that so many of our
clients want to make it much more sustainable.
As you see that these clients and companies
such as Microsoft, Sam Altman also kind
of is investing in a company called Ookla.
Google has its own um, And Amazon has
its own different ways in investing
in some of these nuclear plants.
But as you see, that they are trying
to make this more sustainable.
They're trying to avoid the lag.
Because if something like models or
like AI runs on nuclear emissions,
They run much more faster, seamlessly.
There is very less chances of it being,
um, a breaking in the middle so that you
have to rerun those pipelines which take
hours and hours of compute and resources.
So that is something that clients, we
are making them much more aware about it.
I remember I was at a client location
two weeks ago and I was telling them
that right now, 15 to 20 percent of our
electricity comes from nuclear plants.
That's something that we have to look into.
The government is also, uh, helping you with
the inflation reduction, uh, reduction
act, giving you more, uh, the tax credits
for that as well because as mentioned, we
have a much more better structure around it.
Um, technology has evolved trust,
and we should be doing much fine.
And one thing that I wanted to add here, but
not everyone wants to be leveraging these
large language models to do their jobs.
People are pivoting towards having
a smaller model which can do just
the job right by techniques such as
fine tuning or even prompt tuning.
So I feel that is also a caveat
that I'm seeing nowadays.
Yeah, for sure.
And I think this you, I think you
and Volkmar actually represent really
two sides of a very interesting coin.
I think, you the argument
that you just made as well.
Actually, customers are thinking about
smaller models as a way of reducing
their kind of like energy footprint for
the deployments that they want to do.
And Volkmar also says, well, look, a lot of
the projections are based on the idea that
the chips that we're getting, the boards
we're getting, like that energy consumption
is just going to be the case forever.
But it's actually likely that
the next generation will actually
consume a lot less energy as well.
And so there's actually this really interesting
interplay of basically like, Do, does the,
does the model need to consume as much energy
and does the, does the actual hardware need
to consume as much energy and kind of like the
efficiencies that you're gonna get accordingly?
Um, like, I could see, basically, like,
a world where I guess Volkmar what you're
saying doesn't come to pass for some time.
And so customers increasingly want
smaller models to deal with this question.
I can also imagine a world where there's
some breakthrough where the next generation
of boards is like so energy efficient
that people are like, let's just run
the biggest model that we can because it
costs a lot less energy than it used to.
It's just way more efficient.
Um, I think it'll be really interesting
to see that play out, but I'm, I'm
curious if either of you have kind of an
impression on almost what's going to hit
first, right, seems to be the question.
The moment you have something which is so
dominant in the market, uh, and costs so
much money but has a huge upside potential,
uh, innovation will take place, right?
And so, and we are in a already,
like, if you look at inferencing,
we are on a perfect market, right?
It's a commodity.
Uh, you pay by, by tokens, um, and so you now
have price competition, and so the race to the
bottom is on, right, and so the race to the
bottom is, uh, across different disciplines, so
I can make smaller models, they run faster, I
can make faster inference, um, or I can produce
power more cheaply, right, and I'm what I'm
expecting is that each participant in this
market, because it's such a big market, right?
If you consider something being, you know,
2, 3 percent of total power production
or consumption in the United States, um,
that is billions of dollars at stake.
And so I think each of them will innovate.
You know, the model people will innovate
on the models, the hardware people will
innovate on the hardware, and the power plant
people will innovate on the power plant.
So I think overall we are better off.
But, you know, because now there is
a very specific problem which then
radiates into the rest of the economy.
So if we can suddenly make power at
half the cost, that's wonderful, right?
It will make, you know, a model cheaper.
Yeah, there's like other
reasons why we want to do that.
Right, exactly.
Volkmar, this is the time you
should get back to the Bay Area.
Startup idea.
Yeah, exactly.
Have you considered getting into Fusion?
Yeah. There you go.
I'm going to push this on to our next
topic that I really wanted to talk about.
Um, Anthropic just last week, uh, launched
a new feature, uh, called computer use.
Um, and the basic premise of it is pretty
simple and it's kind of a fun feature.
It's basically the idea that, you know,
ultimately, uh, Your AI, your agent, will
be able to take over your mouse and pilot
your cursor around and do things for you
as if you were like a user on screen.
Um, and uh, and this generated all sorts of
really funny stories I want to talk about.
You know, one of them is that they talked about
during the testing how, You know, the computer
use feature would occasionally get distracted.
So, like, beyond the way of doing a task,
and then it would take a pause to, like,
look at photos of Yellowstone National
Park for a while before going on to its
next task, is that these models actually,
like, have, like, these very funny kind of,
like, simulations of actual human behavior.
But I think I want to just first start
with, like, the business question.
Um, which is, and you know, maybe I'll,
I'll toss it over to you is why is Anthropic
working on a feature like computer use?
Like is it just a cool demo from a research
lab or is it actually really connected to
what they need to do as, as a business?
Look at Anthropic, look at agents, everything
that all these companies are trying to do
is come up with some sort of a symbiotic
relationship between humans and machines.
And whatever use case that you take
in this case, I think Anthropic
is just trying to do that.
I feel, um, with the.
Claude models that are coming into play,
they are trying to help augment some of our
behavior and help us make our lives better
or help us be so much more productive.
I was just speaking about this, uh,
to my mother yesterday and she's
like, I need to book this ticket.
Help me.
And I'm like, I'm in the middle of a meeting.
I don't have time for this.
Just give me half an hour.
Imagine if she had this, I was like, I was
reading about it and I was like, imagine if
she had it and she had the computer use model.
It would help so many people in training,
enablement, people with disabilities.
It has a social impactful angle
to it, which just goes unseen.
And I feel that are the things that the Um,
market, the people in the market, the clients
want something like this going in the future.
So that's something that I
feel has a great potential.
Yeah, for sure.
I think Volkmar, this is kind of fun
because it does connect to what you're
talking about earlier in terms of like
innovation on the interface level.
Um, you know, I, I think what's really funny
is like we invented GUIs and the operating
system in part just because like we needed an
easier way for humans to interact with machines.
But now we have this very funny thing
where now the machine is taking over
that interface to pilot the machine.
Um, and it's kind of like a very funny
historical development that that ended up
being the case, but kind of curious about
how this fits into your earlier thoughts
about, you know, all this innovation
that we're seeing on the interface side.
Yeah.
So I think.
When I looked at their video where they demoed
it, um, it felt kind of useless at first.
tell us more.
So why, why?
And, um, but I think there is, um, like there
is a certain level of smartness behind it.
So.
I believe that if a computer interfaces with
a computer, there are many, there are much
better ways how you can actually do that.
You know, so, like, if you think
about it, they use the browser.
So, like, there's a browser, there's
an engine, that engine is JavaScript.
I can just directly hook
into a JavaScript engine.
I don't need to render something into pixels.
So that rendering effort, and then,
I'm translating that rendering
effort is just insane, right?
So if, if computer to computer
interaction happens, you do this to APIs.
Now, I think we are seeing something very
interesting emerging, which is the API to
the computer becomes the English language.
That's effective with large
language models too, right?
So I talk to you in English
and you're interfacing with.
the outside world.
And the outside world, like, you know, if you
look at what ChatGPT is doing, is, you know,
they're, they're creating a Python script, and
they run the Python script for you to automate
a task, and they pull data out of the internet,
uh, and then, you know, they convert it into
JSON, and then they give you an answer back.
So there, there are the
translator in the middle.
So I think there, the, the ability to
actually interface with a human Uh, the
alternate, the outer human perception, which
is the visual and not the, you know, not
text based, which is usually auditory or
like, we are reading letters, but they're
all letters, is, uh, is the visual domain.
And so suddenly if I can understand
the visual domain a human is consuming,
now I can actually interface with that.
So if I would be a business, uh, and I would do
what Entropic is doing, my guess would be that
they're probably looking at, uh, automating
development processes and automating debugging.
Right?
So, and so what the demo is effectively just
showing is like, Hey, look, we can do this.
But if you, if you convert this into something
which has an economic value, it is probably
in the testing quality control Q& A of
software development, which is, you know, has
millions of people employed today, automate.
So this is, I think, from a, from a,
you know, business value perspective,
that's the direction I would take this.
And so, and that's exactly then, it's not any
more about, um, you know, replacing a machine,
machine to machine interaction, but actually
doing what the human is doing and saying,
okay, are all my buttons correctly aligned?
Is my text formatted correctly?
And now then, then it makes sense, right?
So it's effectively in that realm.
Quality control, potentially data generation,
where you can actually visually then
inspect whether your, your code generation
was correct, if the webpage renders
correctly in all browsers, et cetera.
That's where I can see
where you could take this.
Yeah, that's really interesting.
Yeah, it's kind of a debugging thing.
I think it's, it's fascinating that
kind of like their, their stated reason
for releasing it is not really kind of
like ultimately the business purpose.
Um, I mean one angle, which I don't know if
you buy Volkmar is kind of like, you know,
we don't live in a world with perfect APIs.
Right.
Um, and it is possible that you could
imagine these kinds of models being helpful
for, you know, facilitating interactions
when, you know, there's no clean API
for the system to talk to a system.
I don't think you would do this to the
visual domain, rendering something in a
browser and having a laptop somewhere.
I think it's still like a crazy way to do it.
Yeah, it's just such an inefficient way.
How do I convert, you know, like 10
characters of JSON into a million
pixels and then try to understand that?
Um, and so the, I think there will
be a different layer, but I think
each of these layers has a value.
And so, you know, you may either, you
could also, I mean, if you want to make it
efficient, have the code generated for the
API by the large language model, right?
But now you can go one layer up, and you say,
okay, I run a JavaScript engine, and then the
next layer up is, I run the, The output of
the JavaScript engine in a web browser, and
I'm reading the pixels of the screen, right?
So there are, well, I read the DOM, you know,
they could have just read the DOM out instead
of actually converting the DOMs into pixels.
But, you know, that's why I'm like, oh,
this is my immediate reaction was like,
oh, yeah, this is like kind of weird.
I think from a quality control
perspective, that's huge, right?
And then now you can also say, okay,
please judge me if this interface
is better than that interface.
So suddenly you can do experimentation.
And I think that's where the true value comes.
If you can actually understand the screen.
Well, we actually have like, I
think a little bit of a difference
of opinion between you and Vyoma.
Cause I think Vyoma, you, you made an
argument a little bit earlier, which is,
this is like amazing as a way of interfacing
with agents for like your mom, right?
Like, I kind of curious, like, it seems
like Volkmar is taking a very, technical
approach, which I think is like very genuine,
right, which is there's much more efficient
ways of doing what computer use is doing.
I think one of the things that you're making
an argument for, though, is it like might
help people understand and interface with
these systems better, even though it's
kind of like technically less efficient.
I don't know if you would
agree with that at all.
Um, yeah, there are two caveats to this.
So right now we belong to the tech space.
That's something that we do day in and day out.
When I go out and talk to clients, they have
not even embarked on this journey of AI.
They're still, um, working with
traditional legacy models, legacy systems.
Where they do not even know what AI does.
What, where do we go from here?
So to like onboard these clients to onboard
these use cases, I feel this is a great
starting point to show them the value and
then kind of get them excited about this.
One of the use cases that I have seen
in the past couple of days is, There are
people who are retiring and who have a lot
of information about COBOL or like legacy
systems or network issues, et cetera.
And where does all of this legacy
system go, uh, information go now?
So their companies are concerned that
how do we reuse all of this information?
And before someone retires, how can we?
augment that information into new
systems that we are kind of making.
Imagine if you have something like a technical
aspect like computer use which looks into that,
okay, these are the logs or network issues
that have been logged in for the past couple
of years, and this is how we can embed it into
our new software and help people understand
through that process, uh, that this is.
Not something which is going to try
to replace you, but this is going to
make your life much more easier and
bring in all the lost information.
So, code translation, code understanding,
et cetera, sure is a great use case.
Validation, testing is a great use case.
And one of the other use cases that I feel
in this entire process is, um, understanding.
The language understanding the code, uh, code
understanding would be one of the main use cases
with computer use that I see going on that.
Let's say someone built like a 70
year old COBOL language function.
It will tell you step by step or
anyone that this is what is going on.
This is how it's going to work.
Go to the next step, et cetera.
So it can be broken down
into multiple, uh, cabinets.
That's great.
Well, we'll have to see how this evolves,
um, and I guess we'll have a long bet
on whether or not this ends up being a
debugging feature or a user facing feature.
The final story I wanted to focus on
today was, uh, a really interesting
story that came out of Google.
Um, they announced, uh, a kind of,
sort of, advancement that they were
working on called SynthID Text.
Um, and this is a, a, a thing that
they've integrated into Gemini.
Um, and the whole idea of SynthID Text is to
help watermark generated AI generated text.
And you know, if you're familiar with this
space, traditionally, the problem is if you
watermark this text in this way, you kind
of have to force the model outputs into
ways that are often not great for actually
solving what you need to solve, right?
Um, and their claim is that this methodology
is better because you can do this watermarking.
That is to say you can identify what
text is created by AIs, but you don't
compromise quality, accuracy, creativity,
or even speed of the text generation.
Okay.
Um, and so, um, Vyoma, maybe I'll kick
it over to you first is, you know,
why is something like this important?
Like, do we need watermarking for text?
Like, what's, what's this for even?
What's it for?
Let's, let me answer this question one by one.
We do need watermarking for text.
And again, it is.
quite controversial that I've said that.
Google has been very bold to at
least come up with this product
and kind of be so vocal about it.
There are companies who've been
experimenting this, I know OpenAI has
been experimenting this, but they've
not brought it out in the public yet.
Because, Some of these companies fear that
people will stop using it because now there's
a watermark uh angle to it or like oh this
is something like writers etc they'll be like
oh now I'll be caught or something like that
that that that that really runs in the back
of their mind but I feel watermarking is not
there to kind of judge you or like oh give
me this information but kind of creates some
sort of an ethic standard this standardization
around and that is something that Everyone is
trying to move towards some sort of regulation
that if X amount of tokens are generated by
Y amount of models, then this is what we saw.
This is how it should be watermarked.
There is some sort of logging
that we are doing on top of it.
And I feel that is what brings a
lot of confidence in clients, a lot
of confidence in people as well.
That whatever model that I'm using or whatever
text that has been generated, there are some,
um, marks or metrics that have been attached
to it and that is an angle that I I like
to pick in this because I Work very heavily
in AI ethics and standards and policies And
this is the this this topic comes up every
other day that how do I know this decision?
It takes that has been generated as a
right or wrong There are teachers who
would come up to me and they're like, oh,
I don't know if the student has copied this
assignment It is kind of going to help
Us, students, teachers, all of them create
a more healthier environment to sustain AI.
Yeah, no, I think it's great.
Uh, Volkmar, I'm curious if you
have any sort of thoughts on this.
I mean, I think, um, you know, clearly this
is not the kind of thing that's going to solve
the use of these models for spreading fake
information or something like that, right?
But, you know, I guess I don't know if you
agree that like these kinds of measures are
really necessary to kind of like make this
technology be used in an ethical manner.
So I'm at a total opposite side of
Yeah, let's hear it.
So I do
Somehow I knew.
I knew going into it.
I have two school aged children and you know,
the schools are trying to desperately prevent
kids from using chat GPT to write their essays.
And I believe they should just do everything in ChatGPT,
Um, and the reason is that
GPT does not substitute thinking, right?
It just substitutes the
process of content creation.
And, or it enhances the
content creation process.
So, what we are now arguing is I have a tool
and I need to tag everything which has been done
or produced with the tool, but I'm not tagging
if, if I use a power drill and I'm not drilling
the hole by hand, I'm not tagging every hole
I'm drilling into a wall that's like, oh wow, I
used, you know, a power drill to make this hole
and therefore I need to tell you, whereas I, you
know, I used um, and a tool which amplifies my.
Personal capabilities, you know, and I'm
not every time when I walk somewhere, I'm
like, well, I drove here by car and I need
a tag that I arrived by car and used, you
know, energy, which came out of fossil fuels.
And so therefore, you know, I
need to announce it to the world.
So I think we are in a, in a, in a
world right now, which is bifurcated.
And that's why, and, and we have, um, We
have a society which is kind of split.
There's the society which actively uses large
language models and uses the power of them.
And we have a society which doesn't.
Now the society which doesn't, and then we
have the people who want to regulate everything
and want to tell everybody how to live.
Right?
So we have.
Uh, we are now going, it's like, oh my
God, we need to protect the people who
are not using large language models.
And the poor teachers, they need to change
their way of educating the kids, and it will
only take a hundred years until they are there.
So let's give them tools in their hands
so that they can do the useless teaching
they've been doing for a hundred years,
and then we can figure out if someone is
actually using tools of the 21st century
so that the teacher can punish them for it.
So I was like, well, so I'm not, I need to walk
to my school because, you know, my parents could
drive me and I could save 20 minutes, right?
So I think we are in a world right now,
which is like still the split and we are
in a breaking point how, you know, the
technology is not yet widely adopted, but,
you know, a good chunk of society, which are
the early adults, in particular children, you
know, because, I mean, ChatGPT probably
grew like crazy when the first kid found
out that it can write the essay with it.
And, you know, so, I think we need an education
system which embraces it and we need to
have a corporate system which embraces it.
I think the second one is there's a
certain arrogance by Google to say,
Oh, look, um, we can watermark it.
I was like, yeah, I use another chat,
another chat agent, you know, which I can
just download from the internet, which
removes your watermark and you're done.
There is even thinking that a company has such
a broad distribution that they can actually
push watermarking into, into the world.
It just tells you, it's like, okay,
there will be models of different value.
There's the Google model,
which watermarks everything.
And there's the non watermarking model,
which is actually much more valuable
because nobody can see that I use the tool.
Right?
And so, of course, you just create an
economy of, of cheating, uh, because, you
know, you are trying to tag everything,
except you, you as being Google, you have
to watermarking for your own purposes.
So, just the idea that you could actually
do this is ridiculous from my perspective.
We can agree to disagree on this.
There are two caveats to
this as what Volkmar mentioned.
There are people who know AI, understand AI,
and there are people who are scared to use it.
So I feel the merger point, a point where
everyone's comfortable with it, comes at
a point when all of these techniques and
tools have been experimented for a while.
I still feel we are a little fresh
into this, like look at the internet
revolution, and then look how the chat GPT
or like now that we've, uh, I don't know.
Uh, made the ChatGPT and agentic.
So after this whole large language model
boom came in, what a short period of time
it has been, there hasn't been enough, um,
uh, to be completely honest, products or
use cases that have gone into production,
full fledged production also yet.
So until and unless we reach
a point where we see the.
effects and the long term effects of all
of these techniques that have been used.
I feel, um, right now we can keep thinking about
what are the best ways to come up and the best
ways to regulate or not, but till then just
keep experimenting, keep working on this, and
I feel somewhere we'll all come to a merging
point where everyone will be comfortable with.
I mean, but this is true for every technology.
which has been invented by humanity and if
something is like, you know, three years
old, um, like, you know, we, we do not know.
So let's experiment with it.
The U.S. in general is always, you know, we first try and
then we figure out what works and what doesn't
work and then we regulate it and not let's first
anticipate every bad problem that could occur
and then regulate it before anything happened.
So I think the U.S.will probably be, you know,
reactive in regulation.
Typically regulators are 10 years behind.
And so, like, let's, let's build something first
which is valuable before we're trying to figure
out how to, how to put guardrails around it.
We could go much longer on this.
Uh, Vyoma, we'll have to
have you back on the show.
Thanks for coming on.
Um, and, uh, Volkmar, it's a pleasure as always.
Thanks for joining us.
If you enjoyed what you heard, you can get
us on Apple Podcasts, platforms everywhere.
And listeners, we'll see you next week.