NVIDIA DIGITS: Desktop Supercomputing Unveiled
Key Points
- The panel’s biggest excitement from CES is NVIDIA’s new “DIGITS” system, a compact, high‑memory GPU workstation that brings petaflop‑level AI compute to a desktop size.
- DIGITS packs a 120 GB GPU and can run massive models (e.g., 200‑billion‑parameter networks) locally, potentially shifting AI workloads from cloud data centers to individual desks.
- Priced at about $3,000 and slated for release in May, the device aims to democratize access to supercomputing power, offering a Unix‑based server that works with Windows and macOS.
- NVIDIA’s move into personal‑supercomputer territory positions it against traditional PC makers like Apple, signaling a strategic push into the consumer AI‑hardware market.
- The episode also teases upcoming topics on a new developer‑AI tools report, recent issues with Apple Intelligence, and Sam Altman’s reflections on ChatGPT’s second anniversary.
Sections
- Excitement Over NVIDIA DIGITS at CES - Panelists emphasize NVIDIA’s compact DGX “DIGITS” system as the premier AI breakthrough unveiled at CES.
- NVIDIA Targets Integrated Workstation Market - NVIDIA is transitioning from selling isolated GPUs to offering ready‑to‑use, optimized desktop systems that compete with Apple's Studio line, aiming to capture the entire developer and workstation value chain.
- Edge Computing Benefits for Enterprise - The speaker explains how processing data locally—on factory floors, vehicles, and field deployments—reduces latency, improves security, and creates valuable enterprise applications.
- NVIDIA's Bold Moves Democratize AI - The speaker highlights NVIDIA’s strategic acquisitions, aggressive price cuts, and open‑source initiatives as evidence that the company is positioning itself as the indispensable, affordable backbone of the emerging physical and agentic AI ecosystem, suggesting it is currently undervalued.
- Trust vs Functionality in AI Code Generation - A discussion explores whether developers will accept AI code‑generation tools without scrutinizing trust and governance, referencing IBM’s research history and the early adoption of technologies like GPS that users eventually embraced despite initial skepticism.
- Overconfidence Risks in AI Code Generation - The speaker warns that blindly trusting model‑generated code—especially when the model also executes it—can introduce bugs, emphasizing the need for human skepticism and review.
- Challenges of AI-Generated Code Review - It highlights the difficulty of ensuring quality when most code is produced by AI copilots, emphasizing the need for deep human understanding, effective questioning, test‑case design, and automated peer‑review agents to handle the problematic last 30 % of code.
- Evaluating Apple's On‑Device AI - The speakers discuss whether Apple's smaller, on‑device models meet expectations, noting trade‑offs in performance, security, and battery life.
- Apple’s Rushed AI Debut - The speaker critiques Apple’s hurried launch of a limited‑functionality AI feature, blaming insufficient back‑testing, resource‑constrained edge integration, and a need to catch up in the competitive AI market.
- Apple’s Open-Source Screen Understanding - The speaker praises Apple’s low‑power, open‑source tools for interpreting screen elements, discusses the incremental improvements expected in small AI models for diverse news domains, and highlights the challenges of achieving nuanced comprehension on mobile devices.
- Debating AGI Definitions Post‑Altman Blog - The panel reviews Sam Altman's reflective post on ChatGPT, noting his continued AGI focus and stressing the community’s need for clearer, shared definitions of artificial general intelligence.
- Rapid AGI Progress and Industry Leadership - The speakers laud Sam's company for swiftly advancing toward AGI, expressing amazement at the pace of development and its position as the industry trailblazer.
Full Transcript
# NVIDIA DIGITS: Desktop Supercomputing Unveiled **Source:** [https://www.youtube.com/watch?v=KwGg5WhxuFY](https://www.youtube.com/watch?v=KwGg5WhxuFY) **Duration:** 00:35:25 ## Summary - The panel’s biggest excitement from CES is NVIDIA’s new “DIGITS” system, a compact, high‑memory GPU workstation that brings petaflop‑level AI compute to a desktop size. - DIGITS packs a 120 GB GPU and can run massive models (e.g., 200‑billion‑parameter networks) locally, potentially shifting AI workloads from cloud data centers to individual desks. - Priced at about $3,000 and slated for release in May, the device aims to democratize access to supercomputing power, offering a Unix‑based server that works with Windows and macOS. - NVIDIA’s move into personal‑supercomputer territory positions it against traditional PC makers like Apple, signaling a strategic push into the consumer AI‑hardware market. - The episode also teases upcoming topics on a new developer‑AI tools report, recent issues with Apple Intelligence, and Sam Altman’s reflections on ChatGPT’s second anniversary. ## Sections - [00:00:00](https://www.youtube.com/watch?v=KwGg5WhxuFY&t=0s) **Excitement Over NVIDIA DIGITS at CES** - Panelists emphasize NVIDIA’s compact DGX “DIGITS” system as the premier AI breakthrough unveiled at CES. - [00:03:02](https://www.youtube.com/watch?v=KwGg5WhxuFY&t=182s) **NVIDIA Targets Integrated Workstation Market** - NVIDIA is transitioning from selling isolated GPUs to offering ready‑to‑use, optimized desktop systems that compete with Apple's Studio line, aiming to capture the entire developer and workstation value chain. - [00:06:05](https://www.youtube.com/watch?v=KwGg5WhxuFY&t=365s) **Edge Computing Benefits for Enterprise** - The speaker explains how processing data locally—on factory floors, vehicles, and field deployments—reduces latency, improves security, and creates valuable enterprise applications. - [00:09:12](https://www.youtube.com/watch?v=KwGg5WhxuFY&t=552s) **NVIDIA's Bold Moves Democratize AI** - The speaker highlights NVIDIA’s strategic acquisitions, aggressive price cuts, and open‑source initiatives as evidence that the company is positioning itself as the indispensable, affordable backbone of the emerging physical and agentic AI ecosystem, suggesting it is currently undervalued. - [00:12:17](https://www.youtube.com/watch?v=KwGg5WhxuFY&t=737s) **Trust vs Functionality in AI Code Generation** - A discussion explores whether developers will accept AI code‑generation tools without scrutinizing trust and governance, referencing IBM’s research history and the early adoption of technologies like GPS that users eventually embraced despite initial skepticism. - [00:15:25](https://www.youtube.com/watch?v=KwGg5WhxuFY&t=925s) **Overconfidence Risks in AI Code Generation** - The speaker warns that blindly trusting model‑generated code—especially when the model also executes it—can introduce bugs, emphasizing the need for human skepticism and review. - [00:18:31](https://www.youtube.com/watch?v=KwGg5WhxuFY&t=1111s) **Challenges of AI-Generated Code Review** - It highlights the difficulty of ensuring quality when most code is produced by AI copilots, emphasizing the need for deep human understanding, effective questioning, test‑case design, and automated peer‑review agents to handle the problematic last 30 % of code. - [00:21:34](https://www.youtube.com/watch?v=KwGg5WhxuFY&t=1294s) **Evaluating Apple's On‑Device AI** - The speakers discuss whether Apple's smaller, on‑device models meet expectations, noting trade‑offs in performance, security, and battery life. - [00:24:38](https://www.youtube.com/watch?v=KwGg5WhxuFY&t=1478s) **Apple’s Rushed AI Debut** - The speaker critiques Apple’s hurried launch of a limited‑functionality AI feature, blaming insufficient back‑testing, resource‑constrained edge integration, and a need to catch up in the competitive AI market. - [00:27:46](https://www.youtube.com/watch?v=KwGg5WhxuFY&t=1666s) **Apple’s Open-Source Screen Understanding** - The speaker praises Apple’s low‑power, open‑source tools for interpreting screen elements, discusses the incremental improvements expected in small AI models for diverse news domains, and highlights the challenges of achieving nuanced comprehension on mobile devices. - [00:30:50](https://www.youtube.com/watch?v=KwGg5WhxuFY&t=1850s) **Debating AGI Definitions Post‑Altman Blog** - The panel reviews Sam Altman's reflective post on ChatGPT, noting his continued AGI focus and stressing the community’s need for clearer, shared definitions of artificial general intelligence. - [00:34:23](https://www.youtube.com/watch?v=KwGg5WhxuFY&t=2063s) **Rapid AGI Progress and Industry Leadership** - The speakers laud Sam's company for swiftly advancing toward AGI, expressing amazement at the pace of development and its position as the industry trailblazer. ## Full Transcript
What are you most excited about coming out of CES?
Shobhit Varshney is Senior Partner Consulting on AI for
US, Canada, and Latin America.
Shobhit, welcome back to the show.
What do you think?
NVIDIA's DIGITS.
The supercomputer right next to my laptop.
Mwah.
Love it.
Great.
Uh, Skyler Speakman is a Senior Research Scientist.
Skyler, welcome back.
Uh, what are you most interested coming out of CES?
As a long time PC gamer, absolutely the new line of graphics cards coming out.
And finally, last but not least, Volkmar Uhlig is Vice President,
AI Infrastructure Portfolio Lead.
Volkmar, uh, what do you take out of CES this year?
I'm with Shobhit, it's the
DIGITS.
All right, all that and more on today's Mixture of Experts.
I'm Tim Huang, and welcome to Mixture of Experts.
Each week, MoE is dedicated to bringing you the debate, news, and analysis
you need to keep up with the top headlines in artificial intelligence.
Today, we're going to be talking about a new report on developer use
of AI tools, some trouble with Apple Intelligence, and Sam Altman's reflections
on the second anniversary of ChatGPT.
But first, let's get started.
Let's talk a little about CES and Shobhit, maybe I'll kick it to you first.
We're all very excited by DIGITS.
For those of us who are not obsessively watching all the headlines coming
out of CES, what is DIGITS and why are you so excited about it?
So the intention here is, uh, shrinking their DGX, uh, all
the way down to a small machine,
So NVIDIA has figured out a way to squeeze in a lot more, uh, firepower.
Their graphics, uh, GPU card with an insane memory, 120 GBs attached to it.
So you can start to run some really large AI workloads on your desktop.
Imagine a 200 billion parameter model, which is way bigger than what ChatGPT was
when it came out two years back, right?
You're able to run that locally, right next to your machine.
Now, it comes with a flavor of Unix on it, but you can obviously, instead of
Linux, uh, you can have your Mac and Windows use that as a server, and you
can do some really cool things, right?
But now you're talking about having personal supercomputers that you
can literally keep on your desk or potentially even carry with you.
It won't be out till May.
It's about $3,000, which just looking at the hardware that's going in that
itself is a ridiculously great price point to go deliver that, but this
starts to move computing from the cloud supercomputing all the way down to your
desk, so petaflops of compute at your desktop, and that just is an insane value.
Yeah, absolutely.
I know Volkmar, you were saying that you were excited about this as well.
I know, I think we've talked about it in the past, but if you want to
give our listeners a little bit of an intuition of why is NVIDIA moving
into this market at all, right?
Like, arguably, doesn't this put them in competition with like, Apple and
all these other kind of, you know, kind of desktop personal computer creators.
Whereas NVIDIA's usual thing has of course been data centers.
Do you have a sense of why they're moving into this market?
Yeah, I would not say that NVIDIA traditionally is a data center company.
They are a gaming company.
So, and the data center kind of came along and hit them three, four years ago.
Right in the face, yeah.
Um, yeah.
And you know, good for them, they captured it and was just visible
in their market capitalization.
Um, I think what NVIDIA is figuring out right now is that, um, the
development market, uh, or developer market was kind of limited to, you
know, buy an RTX and stick it into an, you know, developer machine.
Uh, and now they are effectively going all in of saying we need to
cover this whole value chain creation.
And I think it's very, very hard, um, like today, because you, in fact, you
need to buy, um, like a Windows or Linux box and then, you know, you, you stick
in a bunch of NVIDIA cards and, you know, you rake this thing up and now they are
effectively coming out and saying, okay, here's a ready to go system, which is
optimized for that specific workload.
Um, I think.
When you see what Apple did with the M-, you know, M1 to M4 now,
they are effectively trying to capture that desktop market.
And that is not CUDA, and that is not NVIDIA.
And I think NVIDIA is doing a preventive strike here.
And if you look from a pricing perspective, they're sitting right between
the smaller Apple, you know, Apple Studio for $2,000 and the bigger Apple Studio
for $4,000, and so they are at $3,000 and they have specs which are bigger.
And so I think it's, um, and, and now it's also, it's an attachment,
but it's at the same time, you can use it as your primary desktop.
So I think they are- they are are effectively trying to cover their bases.
What will be interesting to see as, you know, what people are
now doing, if you can just, for $3,000, you can get that box.
It's not a DGX, but you know, in many cases, it may be sufficient for
running small scale training jobs.
Um, and so I can, you know, imagine that people are just buying them by
the truckload and putting them in up in data centers and giving their
developers, not necessarily something on the desk, but, you know, maybe it's
tethered, but it's on, on premise.
Um, and so it's a really good way of actually getting
that development loop going.
And you could even use it for production use cases, right?
So if you don't need a 19 inch rack solver, you could use something smaller.
They, I think at three different points in the press release from NVIDIA, they
talk about how easy it is to take the models that you've trained on your small
DIGITS and move it to NVIDIA's cloud.
So I also think they're really pushing for this hook here in order to drive
more business to their data centers.
And one of these is start small on your own personalized local system and make it
extremely easy for you to then scale that up onto, of course, their data centers.
So I think that also plays a lot into the strategy of why
they're really pushing this.
Yeah.
Shobhit, maybe to turn back to you, What do you do with a petaflop?
You know, it's like, it's kind of funny, like, because it is very
exciting, you know, a supercomputer literally on your desktop, but
like, with that level of computing power, what, what do we use it for?
I mean, is it, is it just gaming?
Do you anticipate people doing a lot more homebrew AI stuff?
Um, what, what, what does this unlock, right?
If it just really becomes super successful?
So I think that the two different markets here, one is enterprise, one is consumer.
Right?
I think, uh, from there will be some enthusiasts that'll, uh,
on the consumer side that'll obviously gravitate towards it.
But I think there's a huge potential on the enterprise side.
Uh, what that gives you is being able to run compute that's closer
to where the action is happening.
So think about industrial applications where on the factory floor, you want
compute to be right next to where the manufacturing of everything is happening.
Or, one of my clients, uh, large auto industry, they have a lot of
trucks and buses and things of that nature, and you would want to have
some mobile compute that you can actually run a model on, right?
In a lot of these use cases, there's a lot of latency between calling
a server or a cloud API and being, again, getting responses back, right?
Those are expensive.
So imagine you're taking, say, a picture on the manufacturing conveyor belt, right?
You want to be able to process those near to where the images are being captured
there's less latency and there's a huge security concern here right you want to
make sure that the data especially if it is related to something that's very
sensitive you don't want that leaving your premise either so you want to be
able to run those closer to it same thing goes for say defense applications where
you are doing something more tactical in the field you want to be able to
compute all the images coming in from all the drones and stuff at the, at the
particular place, because you may be in a territory where you really don't
even have a cellular connection, right?
So all of those are heavy computing workloads that used to traditionally take
cloud environments to go scale up and run.
That you're now being able to do closer to where the action is happening.
That's a huge, huge unlock of value for enterprises.
Today, we've been constrained by some cutesy little small, uh, models that'll
be running on mobile devices and things of that nature, but we're not quite there yet
where you can run 200 billion parameter model right next to where the action is.
Yeah, that's really exciting.
Well, a lot more to pay attention to.
Um, I'm definitely going to get one as it sounds like many
of the folks on this call are.
So we'll definitely have to compare notes once they start arriving,
uh, on our respective desktops.
Tim, apart from the DIGITS there were some insanely good things that, uh, NVIDIA,
uh, released, uh, during the keynote.
There were, like, three different areas that, uh, Jensen wanted to
ensure that people realize that this is what NVIDIA really does, right?
So one was in physical AI, figuring out a way in which we can model
the physical universe around us.
a good set of starter AI open source that can understand the physics and we
can start to train things around it.
That leads to things, things like robotics and humanoids
around us in our environments.
Right?
The second big area of unlock was the AI.
automotives.
Figuring out how do we do autonomous driving, and you need the whole pipeline
of millions of sensor data coming in.
How do you process that and make decisions on the, on the vehicle itself, right?
And then the third one was around digital workers, agents doing regular
day to day work as you and I do inside of all the softwares that we work with.
Jensen spent 90 minutes on this on stage wowing the audience.
That's no easy feat, right?
If you analyze the entire 90 minute conversation you start to realize
how an incredible communicator he is, breaking down a concept,
complex concepts into such clarity.
So in each of those different sections he proved that NVIDIA is in fact a leader.
They are making some bold moves to ensure that the ecosystem comes along with them.
They just bought, run:ai for maybe $700 million they turned
around and open sourced it.
It's such a baller move, $700 million and then you open source it, right?
So they're trying to ensure that the entire industry moves closer
to this physical AI and agentic AI and autonomous driving era.
And they want to be the backbone across each one of them.
Uh, last year they had, uh, in the, in the gaming industry and, and Skyler's
going to, uh, chuckle on this, right?
The four, the, the four, the 40 series of their, uh, of
their chips used to be $1,600.
They just released an equivalent compute for $550, right?
So just imagine, Apple will never do this.
They'll never take a $1,600 thing and the next iteration being
a third of the price, right?
So you're getting to this point where, NVIDIA wants to make sure that the
compute is as easily accessible and democratized as plugging into electricity.
But they want to be the electric superpower of the entire world.
And if you look at those three different areas, my hot take,
NVIDIA is undervalued right now.
IBM
is out with a new developer report, um, taking a look at, uh, developers views
on the use of AI tools in their workflow.
Um, a couple of very interesting data points, but I think the place I wanted
to start is, uh, really, I think on this really interesting result where, you
know, the developers were asked, okay, so what do you want most out of an AI tool?
You know, the comment was, well, we want things like trustworthiness in
the AI, and it should be reliable and all the things that you would want.
And then they were asked, well, what are the current problems
with the existing AI tool set?
And it was exactly those same things.
And so I do want to really ask this kind of question of the group, which is, does
feel like despite all the hope- hype around code assistance and agents in
developing and all this kind of stuff that we've been talking a lot about.
Um, it seems like ultimately that there still is this big trust
gap and it is actually preventing adoption of a lot of these tools.
And I guess maybe Skyler, I'll turn it to you first is, you know,
do you see that as a big problem?
Like, do you think that it's ultimately going to kind of put a
ceiling on the use of these tools?
Um, and, and what, what should we make of this?
Like I'm, I'm kind of, it was sort of an interesting result for me.
I'm not sure about a ceiling is the right term, but certainly delay.
Um, I spent, uh, a good time, uh, last year, end of last year, um, in
San Francisco at this International Network of AI Safety Institute.
So this big congregation.
And the topics are, of course, around safety, robustness, trustworthiness,
and those are the topics of the day in this and, uh, here when I talk
to would be clients, they aren't concerned about overall accuracy.
That's not their concerns.
It's how are these machines reaching their conclusions and can we trust them?
That's the back and forth we have now, not accuracy or even costs.
So it's a concern at a global level and even at just kind of an
individual client engagement level.
So yes, it's been part of an IBM research strategy for many years now.
What can we do with trust and governance in this space?
Lots of lots of work to be done there.
Yeah, that's right.
And I think there's kind of one point of view and Volkmar, I don't know if
you agree, you know, working with a lot of folks who are kind of in the nitty
gritty of the technical aspects of this is, you know, I think the AI person's
response also is, well, what do you care about, like trust or reliability, if it
just works, then it just works, right?
Like, you kind of think about like the early days of like Google,
where it's like oh the Google Image Search, there's this GPS thing
that's going to tell you where to go.
Yeah, sure I don't trust that.
And then over time it just turns out like the fastest way to get from point
A to point B is just to put it into GPS and kind of people get over uh, Like
their fear about not really knowing how these systems make decisions.
Do you think that'll kind of be the case here with kind of these,
all these developer tools that say we're going to do code gen.
Uh, and you're like, I don't really need to understand cause it
just works and I'm moving faster than developers that are not.
I don't think so.
So the way right now the development works usually is, and I hope this is
how it works for most companies, is you use the code generation kind of as like,
okay, I know what algorithm I want.
And I can, I can proofread it.
So I can proofread code about 10 times faster than I can write code.
Right.
And so if I go and I need to build something, I'm just going to an agent
and then the agent produces the code.
I'm still checking that the code works and there is still, you know, an architecture
behind it where you are saying, you know, you're kind of interacting with the
system and you're, you almost have a, um, you know, uh, an engineer at your hand
who is very fast and doesn't get tired.
So, uh, and, and you still need to do all the engineering practices we have.
You still need to write unit tests, you know, you still need
to write integration tests.
And so there is a, there is a rigor to it.
Now, if you have bad engineering practices and you don't write unit and integration
tests, then you may actually put, you know, litter your code base with bugs.
But that's more of an organizational, structural problem, right?
So do you allow code which is untested in your code base?
And, you know, a developer can make mistakes and the model can make mistakes.
And we are primarily now asking, you know, who has the higher
likelihood to get it right?
Um, in the end, confidence in your code base will always come from, you
know, test coverage and reviewing that the tests are written well.
And typically in engineering, you're saying, you know, your test should
be 10 times easier to understand than than the code you are actually writing
so that you actually know you it's easier to to check that the tests are
correct than the code itself is correct.
If you follow those practices I think you will discover the
bugs which which get introduced but if you don't yeah good luck.
Yeah definitely.
So do you think that the this report is mostly just revealing the fact that
you know effectively the sort of AI engineering is still more buggy than
humans like that effectively like the lack of trust is kind of well warranted.
I think we are not at the point that I can go blindly to a model
and say produce me 10,000 lines of code And they will be correct.
Um, I think the big challenge is that Um, you know, humans are lazy.
Um, and so there is, there is a tendency that we are overconfident
what the model is doing.
Uh, and if you do that and we are not very skeptical about the output and
we don't, you know, review it, we will actually get bugs into the code base.
Um, I would flip it around and say the more Um, like, open ended question we have
right now is where we are actually putting the model in the middle of the execution.
So there's one is the, is the, you know, code generation, but I can review this.
What if the, if the model actually executes code?
And we see this right now already in ChatGPT.
You ask it a random question, and it goes out, and it produces actually
Python code, and then it runs the Python code, and it gives you an answer.
But then you look at the Python code, it may be buggy, right?
And so sometimes the code doesn't, doesn't even, you know, like, um, when you do
data aggregations, uh, you know, you have like a table and has like, you know,
five values in the first column and seven values in the other column, and then says,
oh, Panda, sorry, I got an exception.
So that, you know, and this happens in real life.
And so you get these answers, which are just bogus, uh, simply
because the code generation and then the code execution is wrong.
And so that's, I think where it becomes much more scary where we are
doing this on-the-fly code generation.
Uh, and I do not think that with the current accuracy, we are there yet.
And so, for small things that may be okay, but for large things, I
think you still need human eyes.
Will that go away?
Yes, probably.
Over the next three, four years, we will get to a point that, you
know, the code will be better than what a human can produce.
Shobhit, to bring you back into this conversation, I gotta believe that
this is like your life, right, is like, customers and clients saying, well, I
don't know if I trust this stuff, and then you being like, no, the water's fine.
Um, I'm curious how this is kind of playing out in your world,
because it feels like this is like a conversation that you have
day in, day out, all the time.
So, from an IBM consulting perspective, right, we have very strict, uh, guidelines
and warranties and things of that nature.
For any code that IBM produces for an end client, we have to be bound
by what our master service agreement says, and what will go into the
code, is it copyright free, things of that nature as well, right?
So there's a pretty high bar for when our team members are
producing code for our clients.
And I think over time, you're starting to see that the quality
of the engineer that is leveraging these copilots matters a lot, right?
If you are a software architect, somebody who's senior, who knows how
to make interns work for you, right?
So say we get you some brilliant software developers and they have these parts
of brilliance, they'll show you some code that's like, oh my God, I don't
believe that this intern wrote this.
And you realize that they actually copied it off from- from Stack Exchange
and they modified it a little bit or something of that part, right?
So it was brilliant, but it was because they had access to other
things and stuff like that, right?
But unless you know how to judge that piece of code, it's very
difficult for you to even think about putting that into production, right?
So the, the bar of the manager for intern is pretty high.
Similarly, when you get a copilot who is behaving like an intern and you're
trying to ensure that the person who is, uh, who's using that copilot should
understand how code is written, right, to, uh, to the earlier points we've made.
We need to know what good looks like, but if the code is being generated
100 percent by, uh, by a copilot, then it's very difficult for you
to understand what logic was used.
Right?
Earlier you said that you can just proofread a code, but then you need
to be really good enough and have done this over and over again before
to understand what to even look for.
What's happening in reality today, 70 percent of the code gets
generated and it works pretty well.
The last 30%, the last mile is where we get stuck.
Right? It's an iterative process.
It takes one step forward, but then it ends up taking two steps backward and
may introduce some other bugs, right?
So unless you really know how you, how the code was written, how you would
have written it yourself if you had the time, You're not really able to get
to that 100 percent unlock of value.
So this tandem between a human and copilot, we also need to figure
out a little bit better on how to ask the right questions, how
to create the right test cases.
And I think having an agent that's going to go review and be the peer reviewer
for the code that's being generated, we're moving towards that place.
A lot of our deployments with our clients, when we introduce other
agents to review the code, review the errors, that multi agent is delivering
higher quality code for our teams.
Then what we got from an LLM that we just start spitting out the code into it.
It's really interesting to think about this is like part of like
a maturity of the overall AI tool chain that needs to happen.
So like the lack of trust is the fact that we have this AI code gen thing, but it's
really not connected to any other AI tools around it, sort of is what you're saying.
It's a new year.
We can be optimistic.
One of the insights from the same, from the same study, what was the
lowest item on this list of 10?
The one that people, the developers don't think is a problem, and
I think is really interesting, it was the quality of the LLM.
So the, these developers are, I think are, are correct and convinced that The LLM
quality is going to continue to increase.
That's not one of their concerns.
Um, and it's, it's really interesting to see that sort of play out here as
the lowest of the 10 options given here.
This came up about half as often as the trustworthy issues did.
Um, so I think that's, that's a pretty interesting takeaway from here.
LLMs will get better.
How we integrate them into the decision making process, that's a
different story, but I think there is kind of a global optimism that these
LLMs Are going to become stronger.
For our next segment, we're going to talk a little bit about Apple Intelligence.
Uh, there was a really interesting news story that popped up in the last week
about how, uh, this new summarization feature that was part of Apple
Intelligence had been messing up, right?
So this would, be a summary of your voicemails, your text messages, and
importantly, your news stories, your news headlines that you were getting.
And they found that in many cases, Apple was actually summarizing incorrectly.
It was hallucinating.
So Apple apologized and promised that they'd be doing better on
the version two of this feature.
Um, I wanted to bring up this topic just because when we talked about this
earlier last year, before the feature came out, you know, the opinion that we
had was AI is going to be perfect for Apple, and they're going to get this so
right, and it's going to be so targeted.
So I wanted to just go back and talk a little bit about,
were we right, were we wrong?
And I guess, I don't know, maybe Shobhit, I'll throw it to you first
on what your hot take is on that.
I think it underperforms in a lot of different scenarios.
I think Apple, uh, is using a lot smaller models to do this on device, to dialing
up on the security side of things to make sure that they're, uh, they're
small, can run, they don't- they're not using some insanely large model to
do the summaries and stuff like that.
So a little bit of the performance hit, uh, I believe is happening because of the
size of the models that they're using.
And we see this in, in real world as well as we're building multi agent
systems and stuff like that too, right?
So I think there's a little bit of, uh, the balance between, hey,
should I, make sure that everything runs on device, and I'm going to
constrain it only to a few things.
It has to be, it cannot start draining the battery and a few other, uh, uh,
things that they have to solve for.
Well, so do I really get a really intelligent model to go do these
summarizations and things of that nature?
Skyler, maybe do you have a similar take or?
I think in addition to Apple getting burned here, I think there's, at
least from what I've seen from the headlines, it's other news agencies
that were using Apple technology.
And so, for example, the BBC, you see this BBC breaking news coming up.
And it's completely made up.
And so the BBC is actually feeling quite burnt in this.
It's not just Apple with egg in their face.
It's partners that they've gone with because now they're getting these
headlines blasted to their customers with the BBC icon next to it.
Gibberish.
So I think it's going to be, yeah, Apple really has to really think about how.
Obviously the technical challenges of getting these hallucinations-
hallucinations taken care of.
But then how do you really pass that messaging on to the consumers going
through another news agency the right way?
Because I think, I think they got hurt on this one.
Yeah, that's right.
Volkmar, one question I had in particular for you, I was having a
conversation recently where he was- a friend of mine was making the argument
that like Apple is ultimately like, they have hardware brain, right?
They like, they do hardware, um, and he was saying that, you know,
he's a machine learning researcher who is like, um, machine learning
is like very different, right?
It's just like, you throw a bunch of data at it and then the machine
just sort of figures it out.
And so its attitude is a lot more just like, you know, just try it.
And then if it works, you know, then it was like a basically a lot more shooting
from the hip than kind of like the mentality of it is that is to do hardware.
And so kind of from this argument, he was saying, like, culturally, Apple's
just like, not well positioned to kind of like play and win in this
space because of like, kind of how careful Apple is in a lot of ways.
Do you think that's right?
Like, is there kind of a point of view here, which is like, in some ways,
like Apple was slow to launch the product, and then it just can't bear
like, organizationally the risk of these things and so it's always kind
of always kneecap that like really launching good features in the space.
I don't think so.
So the way- like, do you remember when Apple kicked out Google from
their phone and did Apple Maps and it was a disaster, right?
Yeah.
And and they took a lot of heat for it.
And now it's you know, one of the main routing applications, right?
I think what's happening is Apple was kind of in a bind because
they were late to the game.
They didn't build a really strong AI team.
This was very visible, like, you know, I was living in Silicon
Valley and Apple was just not there.
Like, they were not present.
Uh, and now they were effectively in a bind of, okay, we need
to bring something out.
We need to make an announcement.
So they made a big splash.
And they, you know, ship the product, they try to keep the functionality
really limited, but effectively make a strong statement, Hey, we are going
to put something on our devices.
We are not like missing the whole thing.
Um, and, uh, I think they had to rush it out.
I think fundamentally there is a backtesting problem.
Those things could have been found.
If they would have done decent back testing on very large scale
data, they have very large scale data on the devices they didn't
Um, and so now they get burned.
Uh, do I believe that will get fixed?
Yes Yeah, I think what Apple is doing is uh, it's defining on edge
devices, you know, how you do deep integration, I think it's still clunky,
like the whole, you know, rewrite my email and rewrite my text messages.
It's not good.
The models are not good yet.
You know, we have much better models out.
I think figuring out how to squeeze something into that form factor with
the resource constraints you have, but the power constraints you have
is a really the tough challenge.
Um, on the flip side, what we are seeing now is every generation of a
new model, we pretty much get the same capabilities for the next smaller model.
So the 70 billion parameter model gets to a 20 billion and the 20
billion parameter model becomes 13 billion and 13 billion parameter model
becomes 7 billion parameter model.
And so, you know, just by waiting 6 to 12 months, we will see capabilities which
you know, have only been traditionally, uh, been able to do in the cloud
on a, you know, like two GPUs or so will be possible to run on a phone.
Uh, but if they would have waited the year, you know, like they
would have lost the market.
So I think they were in this bind.
It's like, okay, technology's almost there.
There's a lot of hype around it.
We need to do something, so let's get something out.
And now they burn their fingers.
Mm.
Yeah. That's interesting.
But it's cool that to think of that, this is basically like Apple Maps.
Again, as someone who just switched actually from Google Maps to
Apple Maps, I'm like, wow, this is actually in fact way better.
Uh, but it was such a funny thing 'cause I remember the initial
reputation of it was terrible.
So I didn't touch it for...
It was terrible like, you're driving the ocean.
That's how I didn't touch it for, so,
And it's, by the way, it's still like that.
For anywhere except for their Apple offices.
So we went to Japan and like, you know, Apple Maps sends you into the forest.
That's amazing.
Um, Shobhit do you- do you agree with that?
It's kind of like, uh, it seems like Apple is the most disappointing, but it seems
like what Volkmar is saying is like, give it time like they will eventually win just
because you can't be you can't beat Apple.
So if you look at the actually, uh, Apple did a lot in the open source community
last year and it's fairly impressive what they did with their Ferret-UI models.
They have these smaller adapter models can run on on device
and things of that nature.
The power envelope is pretty low so they've done an incredibly good job and
open source all a lot of that right.
There are a few things where I think Apple, uh, would, has a,
has a lead over some of the other mobile manufacturers and stuff.
Understanding of what's on the screen, as an example.
They have some brilliant work that they've open sourced that lets you
understand the different elements, so you can then build on, on top of that
and create apps that can take actions on the screen, things of that nature, right?
So they've done some really good fundamental uh, work in 2024 and I'm
expecting that 2025 they're going to start taking better use of the compute
power going up as well as the fact that now they've learned so much the challenge
with a really really small model and As you said earlier this year, the model,
the small models will get better than where they were last year and so forth.
So we're seeing that that'll get incrementally better.
But the fact that you are picking a small model to cover news articles from
every domain, that is a challenge, right?
If you're asking a small model to do a bespoke piece of domain
expertise, that works really well when we deploy this for our clients.
But on an Apple phone, you're expecting it to understand the nuances of negation
and things of that nature on a news article that could be around biology
or it could be around some politics or sports and things of that nature, right?
It needs to have the understanding of every term that's used in golf.
That's different from the way you talk about it when
you talk about soccer, right?
Soccer versus football.
Things of that nature, right?
So you do need a larger model to the summary, but that's the balance
there that they're trying to make.
And I think they will catch up in 2025, but 2024, fundamental work that
they did was, was really, really good.
I think, uh, I really agree with Shobhit here.
Like the foundational work, how do you think about UI integration, uh, how
to think about on device processing and also the offload and then also,
you know, how the, the, the cross.
Um, Quest data domain integration, you know, understanding maps, understanding
your calendar, understanding your email, all that foundational work.
I think it's, it's incredible what they did.
And so I, my expectation is that we will get AI kit too, where also
people can bring their own adapters.
Right now you can't, but that's just the next logical step because, you know, you
20 big models live on a phone because you just don't have the memory capacity.
And so.
The next logical step is like, okay, I can take the Apple model and I can fine
tune it for my specific domain and I can load my adapter into it so that I can
bring, you know, new AI capabilities on device and, but have shared base weights.
And so this is where, where I think Apple did this foundational work
and by saying, hey, we are providing this as part of the operating system
that, you know, people can build on.
And this is, I think, their strength.
So they will, they will do that ecosystem play and give access to it.
But, you know, Apple always starts with the walled garden, you know, and nobody
can do anything until they figured it out by themselves, until they enabled all
the applications and then it will become kind of obvious how, how you build this.
And then we'll run it on our DIGITS, uh, you know,
Right. Exactly.
Yeah.
Exactly.
Yeah.
Exactly.
So
for our last segment, let's do a little final round the horn, uh, Sam
Altman on his personal blog, uh, put out a reflections blog post looking
back at the last two years of ChatGPT.
Um, there's a lot in it.
It's a very long blog post.
I think the big thing that came out of it for me, um, was really just
The degree to which Sam still really believes in AGI as the mission of OpenAI.
He hits on it multiple times, and it's still the big thing he's
rallying the company towards.
But I kind of wanted to get the view of all of you on the panel on, you
know, what you thought was surprising, what you thought was interesting.
Shobhit, I'm curious if you have any thoughts on, on the blog post, and
if there's anything that you thought was surprising or kind of worth
it for people to pay attention to.
Yeah, so he's, he talked a lot about AGI and I think, uh, we as a community
have not agreed to what should be the levels of defining what AGI is.
So I think we need to do a better job before we can even
evaluate people's opinions on whether AGI is achievable or not.
If we don't agree on a definition, of artificial general intelligence
between even humans, right?
How do you even evaluate human intelligence?
Ten kids in a classroom or in high school or in college?
It's very difficult for us to have a good measure for that.
So the community in 2025 needs to have better definitions, just like we did with
autonomous driving, different levels and hear the scenarios, hear the test cases,
we should do a little bit better job of defining that before we can evaluate
if Sam is, is really telling truth about where, how far we are from AGI.
Yeah, for sure.
Skyler, any thoughts?
Uh, hot takes?
Opinions? Yeah, slightly humorous take.
I had forgotten that he was fired and hired back.
So kudos to the PR team for that and it wasn't until reviewing the
blog that I had that trigger again.
You're like, Oh yeah, he was briefly not CEO.
Exactly.
I had forgotten about that.
And so I guess that was, if you're talking for a hot take of reflection
and reading that, that's probably what jumped out at me as it, it
just triggered that memory again.
So, uh, Um, yeah, that that's my that's my hottest take of that.
That's right.
Yeah that's that's such a funny thing because that was such a big story and
I had a very similar experience where I was like Oh, yeah, that was last year.
So, last but not least Volkmar curious if you've got any takes.
Yeah, I I think it's it's a mix of both.
So I think you know having been in you know startups and venture
capital for more than 10 years.
Uh, you know, I can feel for the pain he is going through and you know the ups
and downs and um, you know, getting fired from your own company is really not fun.
Um, but I think that um, it's really interesting to see the um, the product
evolution they are going through.
And, you know, he is pointing this out, like, you know, we, we did ChatGPT
and we released this thing into the wild and, you know, it's the fastest
growing consumer product, uh, ever.
Right?
So that's really amazing to see that, you know, how, how AI took off.
I think in the end, open AI created.
you know, this new wave, uh, they, they took the risk, um, you know, they figured
it out, you know, kudos to him, um, and now it's, it's really the question, like,
I mean, they have, they have this, um, really big north star of, like, we want
to get to AGI, and if you look at, uh, the, like, you know, 2024 with o1, where
they actually say we, we want to get to, you know, human level reasoning, and they
are still innovating, and it's really impressive that, you know, if you look at
OpenAI, um, they are clearly the leader, I think, in this industry right now.
They are defining, you know, the next steps, and I think it's part of Sam's
vision to say we want to get to AGI in a, you know, human scale time frame.
So, and we, you know, every time they're releasing a new product,
it's like, wow, this is possible?
You know, I think they are still driving the industry and
everybody else is a follower.
So that's, that's really impressive.
Yeah, I love that.
Yeah, I think that was one big reflection on the blog post was just
this guy who's running this company seems himself kind of surprised
about how fast things are moving.
You know, it's like, Oh, yeah, wow, like we're doing this thing.
It's only been two years.
And that's like, very fun to see that even he is like, continually uh,
confounded by how things are happening.
So, um, well, great.
Well, thanks for joining us.
Uh, Shobhit as always, Volkmar as always, and Skyler as always.
It's a pleasure to all have you on the show.
Um, and thanks for joining us, uh, all you listeners out there.
If you enjoyed what you heard, you can get us on Apple Podcasts, Spotify,
and podcast platforms everywhere.
And we will see you next week on Mixture of Experts.