Granite 3.0 Launch at IBM Tech Exchange
Key Points
- IBM unveiled Granite 3.0 at the Tech Exchange, a state‑of‑the‑art, open‑source (Apache 2.0) large language model family that includes language, safety (Granite Guardian), and efficiency variants.
- Unlike earlier generations that were split across English, multilingual, and code models, Granite 3.0 consolidates all those capabilities into a single, unified model.
- The new release pushes performance boundaries for an 8 billion‑parameter model while maintaining broad functionality and high efficiency.
- The launch received “phenomenal” early reception, highlighting strong interest from the AI community.
- The announcement was presented by IBM Research leaders Kate Soule, IBM Fellow Kush Varshney, and principal research scientist Petros Zerfos.
Sections
- Untitled Section
- The Hidden Work of Data Curation - A speaker explains why assembling, filtering, and annotating petabytes of raw internet data for AI model training is an enormous, technically demanding challenge.
- Emoji Dilemma in AI Model Training - The team explains training thousands of tiny models and hundreds of billion‑parameter models using extensive IBM infrastructure, and debates how many emojis to include in the training data to avoid over‑ or under‑representation in enterprise‑focused outputs.
- Dual-Model Safety Architecture - The speaker describes using a primary language model alongside an independent Granite Guardian model—built on Granite 3.0 and constrained to yes/no judgments—to detect harms, jailbreaks, hallucinations, and relevance issues, offering a universal safety checkpoint for any AI system.
- Apache 2.0: Open Model Licensing - They explain why IBM’s Granite models under the permissive Apache 2.0 license are valuable for enterprises, allowing unrestricted use, customization, and ownership of IP while contrasting this simplicity with the trend toward custom‑licensed open models.
- IBM Opens Model Assets - IBM explains its decision to release model weights, software, and data‑prep tools under the permissive Apache 2.0 license to foster community collaboration, reproducibility, and shared best‑practice development.
- Spotlighting IBM AI Offerings - The speaker outlines key IBM AI resources to explore—including local model deployment, agent orchestration tools, and upcoming safety-focused developments for Granite.
- Perplexity AI Valuation Debate - The speakers explain Perplexity’s AI‑driven search model, note rumors of a $500 million funding round at an $8 billion valuation, and debate whether this price and its claim to become “the new Google” are justified.
- LLMs vs Traditional Search - A participant critiques using large language models as search tools, highlighting their lack of information retrieval fundamentals, fact‑validation, and credibility ranking, and raises safety concerns about over‑reliance on their recommendations.
- Chat vs Search: Future Paradigm - The speakers debate whether chat interfaces are just an incremental upgrade over traditional search—using horse‑vs‑car metaphors and the “anchoring” effect of ChatGPT—as they consider how current LLM dominance may be a historical accident shaping expectations for both rigorous researchers and casual users.
- Nvidia's Move Into Model Training - The speakers argue that Nvidia’s longstanding CUDA software ecosystem and emerging cloud services naturally drive the company to expand beyond GPUs into AI model training and open‑source releases.
- From Data to Model Customization - The speaker foresees a shift in AI development from gathering datasets to selecting and fine‑tuning pre‑trained models, positioning NVIDIA as the primary provider of customization services—the “shovel” in the emerging AI gold rush.
Full Transcript
# Granite 3.0 Launch at IBM Tech Exchange **Source:** [https://www.youtube.com/watch?v=5-xMSQZ9xx0](https://www.youtube.com/watch?v=5-xMSQZ9xx0) **Duration:** 00:37:12 ## Summary - IBM unveiled Granite 3.0 at the Tech Exchange, a state‑of‑the‑art, open‑source (Apache 2.0) large language model family that includes language, safety (Granite Guardian), and efficiency variants. - Unlike earlier generations that were split across English, multilingual, and code models, Granite 3.0 consolidates all those capabilities into a single, unified model. - The new release pushes performance boundaries for an 8 billion‑parameter model while maintaining broad functionality and high efficiency. - The launch received “phenomenal” early reception, highlighting strong interest from the AI community. - The announcement was presented by IBM Research leaders Kate Soule, IBM Fellow Kush Varshney, and principal research scientist Petros Zerfos. ## Sections - [00:00:00](https://www.youtube.com/watch?v=5-xMSQZ9xx0&t=0s) **Untitled Section** - - [00:03:05](https://www.youtube.com/watch?v=5-xMSQZ9xx0&t=185s) **The Hidden Work of Data Curation** - A speaker explains why assembling, filtering, and annotating petabytes of raw internet data for AI model training is an enormous, technically demanding challenge. - [00:06:09](https://www.youtube.com/watch?v=5-xMSQZ9xx0&t=369s) **Emoji Dilemma in AI Model Training** - The team explains training thousands of tiny models and hundreds of billion‑parameter models using extensive IBM infrastructure, and debates how many emojis to include in the training data to avoid over‑ or under‑representation in enterprise‑focused outputs. - [00:09:22](https://www.youtube.com/watch?v=5-xMSQZ9xx0&t=562s) **Dual-Model Safety Architecture** - The speaker describes using a primary language model alongside an independent Granite Guardian model—built on Granite 3.0 and constrained to yes/no judgments—to detect harms, jailbreaks, hallucinations, and relevance issues, offering a universal safety checkpoint for any AI system. - [00:12:26](https://www.youtube.com/watch?v=5-xMSQZ9xx0&t=746s) **Apache 2.0: Open Model Licensing** - They explain why IBM’s Granite models under the permissive Apache 2.0 license are valuable for enterprises, allowing unrestricted use, customization, and ownership of IP while contrasting this simplicity with the trend toward custom‑licensed open models. - [00:15:34](https://www.youtube.com/watch?v=5-xMSQZ9xx0&t=934s) **IBM Opens Model Assets** - IBM explains its decision to release model weights, software, and data‑prep tools under the permissive Apache 2.0 license to foster community collaboration, reproducibility, and shared best‑practice development. - [00:18:38](https://www.youtube.com/watch?v=5-xMSQZ9xx0&t=1118s) **Spotlighting IBM AI Offerings** - The speaker outlines key IBM AI resources to explore—including local model deployment, agent orchestration tools, and upcoming safety-focused developments for Granite. - [00:21:46](https://www.youtube.com/watch?v=5-xMSQZ9xx0&t=1306s) **Perplexity AI Valuation Debate** - The speakers explain Perplexity’s AI‑driven search model, note rumors of a $500 million funding round at an $8 billion valuation, and debate whether this price and its claim to become “the new Google” are justified. - [00:24:52](https://www.youtube.com/watch?v=5-xMSQZ9xx0&t=1492s) **LLMs vs Traditional Search** - A participant critiques using large language models as search tools, highlighting their lack of information retrieval fundamentals, fact‑validation, and credibility ranking, and raises safety concerns about over‑reliance on their recommendations. - [00:27:57](https://www.youtube.com/watch?v=5-xMSQZ9xx0&t=1677s) **Chat vs Search: Future Paradigm** - The speakers debate whether chat interfaces are just an incremental upgrade over traditional search—using horse‑vs‑car metaphors and the “anchoring” effect of ChatGPT—as they consider how current LLM dominance may be a historical accident shaping expectations for both rigorous researchers and casual users. - [00:31:05](https://www.youtube.com/watch?v=5-xMSQZ9xx0&t=1865s) **Nvidia's Move Into Model Training** - The speakers argue that Nvidia’s longstanding CUDA software ecosystem and emerging cloud services naturally drive the company to expand beyond GPUs into AI model training and open‑source releases. - [00:34:08](https://www.youtube.com/watch?v=5-xMSQZ9xx0&t=2048s) **From Data to Model Customization** - The speaker foresees a shift in AI development from gathering datasets to selecting and fine‑tuning pre‑trained models, positioning NVIDIA as the primary provider of customization services—the “shovel” in the emerging AI gold rush. ## Full Transcript
what's the most exciting announcement
at this year's IBM Tech Exchange?
Kate Soule is a program
director at IBM Research.
Kate, welcome.
What do you think?
The Apache 2 license of Granite 3.0.
Kush Varshney, IBM Fellow.
Uh, granite Guardian.
And joining us for the very first
time, Petros Zerfos, who's a principal
research scientist at IBM Research.
That's an easy one.
That's a high performance granite 3.0.
Terrific.
All that and more on today's Mixture of Experts.
I'm Tim Hwang, and it's Friday again,
which means it's time again to take a
whirlwind tour of the biggest stories
moving artificial intelligence this week.
We'll talk about NVIDIA's latest and greatest
open source model, perplexity raising at a
wild evaluation, But first, we're going to talk
about IBM's annual tech exchange conference.
There's a slew of announcements out of
IBM, and we've got the ideal team to
talk about what's launching this week.
The first headline that I want to
really address is that Granite 3.0 is out.
Um, and Kate, I know you played a really
big role in getting that all together
and being a big part of the launch.
Um, tell us about what's exciting and different
here from the previous generations of Granite.
Thanks, Tim.
So we're really excited about Granite 3.0. It launched at like 12 15 a.m. on Monday morning.
And you know, down to the minute,
I know down to the minute.
Uh, and the reception has been
really, really phenomenal.
So Granite 3.0 is IBM's state of the art
large language model family.
They're a series of models that cover
Language models, safety models called
Granite Guardian, uh, we even have some
models focused around efficiency, like a
speculative decoder model that came out,
and they're all available under Apache 2.0 license, which is really exciting.
Yeah, and I would say, is there kind
of a deeper theme that IBM's kind of
pushing with this set of releases?
It almost feels like every generation of
Granite's getting kind of broader and broader,
and there's like more and more things launching
with each generation, but I'm curious if
the team had any particular thing that they
were kind of emphasizing on this round?
Well, with this round, Our main goal
was to actually consolidate all the
different things into one model.
So where before IBM had English language models,
multilingual models, code models in our previous
generations, generations one and two, with
generation three, we're able to bring all of
that into one model while continuing to push the
boundaries of how much performance can you pack
into, you know, an 8 billion parameter model.
Nice. So I really want to get into the
details here because we have an
ideal configuration, which is.
Kate, you and Kush and Petrus were all involved
in the Granite sort of release and we'd
love to kind of dig more into the details.
Petrus, I think maybe I'll throw it over
to you because you name check that the most
exciting thing, uh, at, uh, Tech Exchange this
year was, uh, Granite, which you worked on.
Um, do you want to tell us a little bit
about your involvement with the release
and what's got you most excited about it?
Yeah, absolutely.
Yeah, as I mentioned, it's a very exciting
release, uh, my involvement is around the data
engineering, essentially the preparation of
the huge amounts of data that goes into the
training of, um, such kind of large language
models all the way from the acquisition.
of the point where it's converted into their
vectorized form, which is called tokens.
And this is essentially what's used
for the training of granite models.
It's billions of documents, lots of
terabytes and petabytes worth of data,
massive infrastructures thrown behind it.
Very exciting.
Yeah, for sure.
And I really want to get into that
because I think so often, you know,
particularly at these tech conferences
or even, you know, just in general,
people always see the end result, right?
They say, uh, look at these
cool new models I can use.
And, you know, as someone who's a
consumer of these models, obviously
I'm personally very excited.
But I think what's so exciting about your
work and the opportunity of having you on
the show today is to kind of talk a little
bit about what goes on behind the scenes.
Um, and that data curation, um, like tell
us what's like, what is hard about it?
Right. Like what, what makes it
a really hard challenge?
Right.
That's a very good question.
Um, what makes it a very hard challenge is, um,
a multitude actually of things, many challenges.
First of all, um, the sheer volume of data
that is needed in order to essentially be
curated and be fed in some sense into the
training process is, um, is breathtaking.
Um, we're starting with Literally petabytes of
raw data collected from a number of sources,
including the whole internet itself, and then
the curation process and the subsequent steps
of annotation and filtering towards essentially
finding the golden nuggets of very high
quality data that will go into the training.
It's a massively kind of challenging process.
Lots and lots of Um, uh, machines and
clusters and data centers, if I may say,
um, are needed in order to go through
such kind of, um, cleansing and filtering.
Yeah.
Were there any particular documents that
you were like, oh man, this is in here?
Or like, I'm kind of curious about
like if there's any surprises in the
process where you're like, oh, it's
really funny that the most high quality
piece of data is, you know, ABC or XYZ.
Right. So, um, having essentially, you know,
kind of a process through pretty much.
Most of the data that's out there in the
internet, you can definitely find some things
that make you kind of wonder about humanity
itself, what it puts out there, if I may say.
There's absolutely, of course, you know,
the golden nuggets of knowledge in the form
of textbooks and the scientific papers.
And the medical studies and the legal
studies that are written essentially by the
scholars and by people with high expertise.
And of course it's a pleasure to
have the high quality aspects be
included in the training of granite.
So it's an aspect that we've talked about
on the show before, um, and I think this
is a great thing hearing you talk a little
bit more about it is just, you know, this
is not just a matter of kind of dumping
huge amounts of data into the model.
Uh, it is that, but there's just a lot of work
that goes into like selecting the right tokens.
It's almost, um, you know,
artisanal in nature, right?
You're getting like the right,
you know, blend to get the most
or best results out of the model.
Well, Tim, it's our artisanal, but I also
want to highlight with something that the team
did that I think is really cool, which is the
degree of experimenting and searching that
the team did over different data mixtures.
So training one, you know, 2 billion
parameter model requires training.
Petrus, I don't know how many small
models do you think you trained?
Oh, we trained hundreds
of those, uh, very easily.
No big deal. Even smaller models, we trained thousands
of those to get, uh, to get down to
the proper mixtures, uh, as well as
kind of bigger models in the order of
like one to two billion parameters.
We trained literally hundreds of those.
In order to figure out what is the best type of
cleansing and the best type of mixing, right?
So, um, definitely lots of effort by
very large teams in IBM research and
lots of infrastructure from behind it in
both TPUs as well as general clusters.
Yeah, for sure.
And to underscore some of the, like, it's
not always black and white, right, on, uh,
on data, so some of, to underscore a bit some
of the decisions and processes the team went
through, like, a fun example is just thinking
about, you know, how many emojis do you
include in Um, Uh, granite training model data.
So like, what is the appropriate level
of emoji, Tim, for a model to understand?
It's a hard question.
I mean, um, what, what's the
risk of having too many emojis?
Well, then the model has a predilection
to give a lot of emojis in the response,
which I mean, depending on your use
case, maybe you care about that, but you
certainly, um, in an enterprise setting,
probably don't want to skew towards emojis.
But if you remove emojis altogether, The
model doesn't understand the concept of
emojis, can't interpret emojis, which
of course is going to be critical for a
variety of just basic tasks and use cases.
So, you know, there was a whole effort,
I'm not kidding, just figuring out what
is the right level of emojis that the
model should be trained on to understand.
That's fascinating.
Well, uh, Kush, I don't want
to let you off the hook here.
I know, I understand you were
also involved in this release.
Uh, do you want to talk a little bit about
more of your part of, uh, of this, this launch?
Yeah, I was involved in a few
different parts, actually.
So on the Granite 3.0, the language models, actually, as
Kate said, it's really language and
code and a lot of things all together.
Um, uh, so I was involved in
a lot of the safety alignment.
Uh, so.
Uh, these things, uh, after
Petros does his work, right?
Um, we have the pre training data, then
there's, um, uh, the training process, and
then there's further alignment after that.
Uh, so part of that is, uh, taking the model
from the base model into an instruct model.
But then after that, doing further, uh, tuning
to, uh, to make it safe in various ways.
Um, so, uh, I was working with, uh, again, a
big team of folks, um, and, uh, we were Coming
up with seed examples to generate synthetic
data across many different types of harms and
risks and, uh, kind of, uh, figuring out how
to get the model not to engage in those topics.
So, uh, that's one key area and I would just
want to point out, uh, the way we evaluate
that, um, that level of safety is, uh,
Uh, through a variety of benchmarks.
One of those was developed in our research lab.
It's called ATTAQ, A T T A with
a Q at the end, and, um, uh, we
actually compared, uh, the Granite 3.0, um, uh, all of the models, but the
8 billion instructor as an example, um,
outperforms, uh, all of the other competitors
that are out there, um, in this benchmark.
It's, uh, really is the, the
safest, uh, in, in many ways.
And then, um, That's one half.
Um, the other half of the work was
on the granite guardian models.
And so, uh, the way to think about it
is, um, when you're thinking about safety
about preventing harms, you want to do
the best that you can on the main model.
But then, you know, I mean, inherently
that it's never going to be perfect.
So there should also be a
second model that's independent.
That's actually checking the first
model to make sure that, uh, it's
not putting bad stuff out there.
So, um, The Granite Guardian is a second model.
It's actually built on top of the Granite 3.0 language models, um, Uh, it's, uh,
but it's, uh, kind of constrained
just to give a yes or a no answer.
Uh, so it'll say, um, look at either an
input prompt at a model response or the
combination and it'll say yes or no.
Is this, um, harmful?
Is it doing, uh, is it a jailbreaking attack?
Is there a hallucination?
Is there, um, a problem with context relevance?
Is there a problem with answer
relevance in a RAG setting?
This model is meant to act in that capacity,
and, um, uh, it's important to understand.
It's actually not limited to just working
with the granite, uh, models, so you
can apply this with any model out there.
So I know we're going to talk about other
models later in the show, so you can use
the granite guardian with any of those.
Yeah, for sure.
There's a lot there.
I mean, I guess maybe one question to kind
of push you a little bit further, Kush,
is um, How do you, I mean, one thing that
occurs to me is like, safety is so broad.
There's only so many things that like
a model could do wrong in the world.
How does you and your team kind of like
work to kind of, Manage those risks right
because it's like this infinite attack space.
Yeah, but as yet, you know, like tech exchanges
this week You got to get something launched
Curious about how you reconcile those two
how the team thinks about like broadening its
risk set over time or narrowing it over time
I just think it's a really interesting process
that a lot of people don't usually hear about
Yeah, and it is always about broadening.
So as you said, that attack surface
area is, uh, pretty much infinite.
So we can only pick and choose
and touch on some parts of it.
And we understand that.
Um, so we created this, uh, attack atlas.
Um, it's a paper.
It will be presented in
Europe's, uh, in a workshop.
And I mean, there's so many different
ways, um, so many different strategies,
um, so many different topics of harm.
Uh, so, uh, we can, I mean,
just do our best, right?
Um, it's a, yeah.
But you keep making progress.
You keep adding things.
So using taxonomies to categorize
different types of risks and harms.
So building that up, trying to
get as broad coverage as you can.
Looking at different
strategies of those attacks.
Keep working on that.
But it's a cat and mouse game in some sense.
So kind of red teaming, blue teaming, going
back and forth, kind of seeing what the problems
are, then going Figuring out how to address them
and, uh, yeah, just, uh, just cycling through
it and, yeah, I mean, again, nothing's ever
going to be perfect, uh, it's a, it's a process.
Yeah, for sure.
So, uh, Kate, you'll have to indulge me,
I mean, as the lawyer on the phone, you
were like, the most exciting thing is
Apache, and I was like, oh my god, yes.
Let's talk about Apache 2.0.
Um, why should our listeners be excited about
that if they're not huge licensing nerds?
So, I think the reason to
be excited is Apache 2.0 is an incredibly permissive license.
It basically says that anyone can
take and use our Granite models.
And can customize them however you like,
use any outputs of the models however
you like, and IBM will make no claims
to that IP, and you have full rights.
So, that's really important for especially
enterprises who are looking to customize.
models, large language models with their own
data, with their own IP, you want to make
sure you have no further restrictions on what
is essentially now your, your IP that you've
encoded inside of a large language model.
So we're really excited about being able to
offer these models under those terms, and
make sure that they're just as reducing the
barriers for the broader community to use
them and customize them as much as possible.
And it's something of a bit of a dying breed.
Unfortunately, if we look at models that
are being released in the open, um, we
are seeing models continue to be released
in the open, but more and more they're
being released with custom licenses.
So we're trying to keep it simple.
Apache two, please take our models.
Please customize 'em and go
use them out in the world.
Yeah, I was definitely
confronted with this recently.
I was, you know, importing a model from
Hugging Face recently and I was like,
oh, the model was gated and then there
was like a completely custom license.
I was like, this is gonna take forever to see
if this is something I really wanna work with.
Can I ask why?
Like, why is IBM taking the
most open kind of perspective?
It sounds like there's actually
been a conscious strategy to say.
We're going to be out of all the
open providers, the most open
well again, I think it really comes down to
this enterprise use case where we believe the
future of large length trials and generative AI
and the enterprise is being able to customize
models with enterprise and proprietary data.
And so really, we're trying to create
the tools both through the base models.
like the granite model series, which can be then
customized without any restrictions on its use.
And tools like construct lab, uh, which is
through our rel AI product offering at Red Hat.
You can take those models and customize them and
build on top of them without concerns or worry.
And then wrapping that all right under, if
you get our models through, for example,
Watson XAI through indemnification
and other protections support.
So it's really trying to make
sure that we create this kind of
open market and ecosystem that.
Our customers can build on with confidence.
Yeah, and I think that's actually a theme
I really wanted to build on, just because,
I mean, a big part of this seems to be like
unleash the developers to kind of do what
they need to do around these models, and we're
not gonna kind of put any controls over that.
But one kind of unique thing, and I
guess Petrus, maybe you're the natural
person to bring into this discussion,
is IBM as I understand it, is also open
sourcing, the kind of data prep kit.
around these models, which is
kind of a unique thing, right?
Like, I think there's been a lot of hype
around open sourced models, but like,
what seems to be here is also like a
level of openness around all of the stuff
that goes into constructing the model.
Um, and, uh, I guess there's
kind of two questions for you.
One of them is why, like, why is IBM doing that?
Um, uh, but let's maybe start there.
And this is a kind of follow
up I would love to ask you.
Sure.
Yeah, that goes along the general thing,
theme of openness, we're not opening the
models themselves and their weights under
the, if I may say it in a single sentence,
the most permissive license, right?
That's what Apache 2.0 is.
We're similarly doing the same thing with
the software assets that we developed, we
open source them again under the Apache 2.0 license, the most permissive one.
Both to enable the community to build
upon that, be able to reproduce it, be
able to make use of the same in some
sense kind of facilities that we developed
and used for the training of granite.
We believe that this benefits the
overall community, the overall ecosystem.
And, um, you know, as Kate said, it enables
more and more developers essentially to
follow the same kind of best practices
that we, um, kind of, uh, learned
through hard lessons, if I may say.
I mean, like you, you guys have solved the emoji
questions so that developers don't have to.
We debated a lot around the emoji
question.
Yeah, I can tell this, I can only imagine
the meetings and, uh, it's incredible.
Exactly.
Um, that's really great.
I guess the follow up question around kind
of this, um, sort of the data prep kit and
open sourcing it as well is, you know, I'm,
I'm curious if you think that this is also a
way of kind of encouraging other providers to
also start releasing their data openly, right?
Because I think this is like such an
important aspect of the ecosystem.
And I've been also frustrated at times, right?
Like a new model will come out
and it behaves totally differently
and breaks all of your tooling.
And you're like, why is that?
And I would love to be able to kind
of delve a little bit further, and so
the hope is like that what IBM's doing
here becomes a more general practice.
Um, and I guess, kind of, Petros, I'm curious
if you think, like, if you're hearing from
around the industry, like, I would love
to see this become more of a norm, but I'm
curious if you think it's going to become one.
Right, yeah, no, that's a very good question.
Yeah, there's kind of an interesting
attachment that goes like, um, you
know, every conversation in AI starts
with models, but ends with data, right?
Meaning that everyone recognizes
that data is the, um, kind of oil or
the fuel that powers the AI models.
So, um, open sourcing this is, um, you
know, the data prep kit is, is a kind
of a good practice, bring on more people
and developers into doing the same.
There is a trend developing around essentially
providing the data assets for preparing models.
NVIDIA, for example, has its own NEMO curator.
Other kind of big names as well are
going towards that direction, along with
the general kind of team of openness
that IBM is kind of advocating for.
Um, data is essentially the
natural thing that follows.
Yeah, for sure.
Well, I want to throw it open
before we move on to our next topic.
I mean, obviously there were lots
of things announced at TechXchange.
Uh, we've been talking a lot about
Granite because, well, you all just
spent a lot of time working on Granite.
Um, are there other things that you'd
point people towards that they should,
you know, check out while they're
kind of looking at this stuff online?
I'm curious if, you know, I know there's
a code assistant announced, but also
just like I was looking at the list
and I was like, there's way more.
Things that happened here, then we'll
have time to cover, but I'm kind of
curious if there's like, you know,
specific things that you'd highlight that
we should, uh, kind of shout out here.
Uh, I might just give a couple of shout outs.
One, please go and try out the
models, especially on platforms
like we're really excited.
Olama, you can run these models locally.
They're blazing fast.
Uh, really excited that we're making
these models broadly available across
a number of different partners.
Um, second, there was a big focus at
Tech Exchange on agents and assistants.
So really excited to see how the What's Next
software portfolio is continuing to evolve
and create different agent orchestrators
and management of agentic systems.
So I think you're going to continue to see a lot
of really exciting work from IBM in that space.
Yeah, for sure.
Um, and maybe we'll end with
you, Kush, is, um, you know.
Uh, where is Granite going next?
Uh, like I guess this is kind of a question
when we're sitting here in 2025 talking about
what just happened in tech exchange, I'm
kind of curious about like what the team is
piling towards and particularly in safety.
I know what you work on, like if there's kind
of like what's the next frontier on safety, I
think would be great for people to hear about.
Yeah. So, um, I mean, one thing just
building on what Petro said.
Um, so, I mean, having this data prep
kit out there is not only a boon for,
I mean, the value creation among the
developers and the ecosystem, but it is
also contributing to the safety aspects.
Um, because, uh, When you can inspect
these things, then you can know.
I mean, this is why this is happening.
And these are potential concerns as well.
So I think just the movement towards openness
is going to be a big aspect of the safety world.
Um, uh, TechXchange.
We announced some new features and
what's next on governance as well.
Um, so that's our platform played
on the governance and safety side.
But, um, where granite goes next, um, with
the granite guardian, especially, um, as
Kate said, with the agentic workflows.
So our next, uh, Release of Granite
Guardian will have a function
calling hallucination detection.
So that's something that's not out
there, um, uh, from anyone else.
And, uh, I think that'll, um,
kind of, uh, bridge the gap.
So when you are talking, uh, to a
model in natural language, and then it
translates that into an API call, um, We
want to make sure that there's nothing
wrong happening between in that step.
So, uh, the parameters for the function
names or the parameter values, all of
those should, uh, come out cleanly.
So, uh, that's, uh, I think one of the more
exciting sort of things that we have lined up.
Well, awesome.
A lot more to potentially talk about, but this
is a great overview and I'm glad we kind of got
a little behind the scenes on how these launches
happened because there's just, you know, you
just see the model at the end of the day, but
it's just like, it turns out like lots of humans
spend a lot of time just getting that right.
So appreciate you giving our
listeners a little bit of a lead in.
So our next story that we really want to
focus on is some news that kind of came
out this week about the company Perplexity.
Um, So if you're not familiar, perplexity
is essentially AI driven search.
Um, It's a little bit different from,
you know, what you kind of get from
like a traditional Google experience
where, you know, you have a search bar.
Instead, it's kind of you ask queries
and you can kind of make it interactive.
You have a conversation, um, and
it pulls results from the internet.
Um, The big news was that the rumors
were that it was about to raise, the
company was about to raise 500 million,
um, at an 8 billion valuation, which
I believe is twice what it was before.
Um, and, uh, and that's just, that's just wild.
Obviously we're living through an era of like
a lot of excitement around AI and AI company
valuations are sort of through the roof.
But even I saw this number and I was like,
wow, this is, this is really intense.
Um, and I guess, Kate, maybe
I'll kick it to you is, um.
Is this valuation justified?
Like, what is chat the future of search?
Like, is this the new Google that we're
looking at, or kind of curious about
how you size up news that a company
like this would be valued at this value?
Yeah, I mean, I think there's a lot going
on, as you say, certainly a lot of hype.
Um, but yes, I think chat
is the future of search.
I think it's just a much more natural way to try
and find information to inquire about something.
But I also really wonder, you know,
how differentiated or competitive,
uh, perplexity can stay, right?
What is, how are they gonna, what's their
moat, you know, to prevent others from
basically doing the same thing with a, um,
with a different end, uh, API end call?
So, you know, I, I do worry that we're
seeing some inflation here, uh, of
expectations and, uh, in this valuation.
Yeah, for sure.
I mean, not for nothing, I mean, out
of all of the subscriptions that I'm
spending on monthly, perplexity is one
of the few that I actually use regularly.
But, okay, I think you're touching
on a super important question,
which is, what, what is the moat?
Is there a moat here?
I mean, Petross, I'm curious if you have
any views on like, it seems like maybe other
search companies that I could name might be
really good at this, doing this at some point.
But I mean, clearly someone sees
something in this, like, they
feel like it's a good enough bet.
Um, and so I guess, I don't know if you want to
give us like the, the kind of bull case, right?
Like, is there a moat here?
Right.
Um, I have to admit, I agree with Kate.
I'm also struggling to figure out
what the mode is in this case.
Search, unsurprisingly, has been one
of the areas that many companies, both,
um, startups as well as, like, um, big,
um, megacorporations, actually have.
try to tackle and attack essentially
the incumbents, right, over the years.
It never panned out.
Well, why?
Because, you know, the existing
ones were good enough, right?
They were pretty good.
Now it's a brand new interface,
essentially, a brand new way of
interacting with the search engine.
More importantly, Getting something
that I feel everyone appreciates a
very good summary of things instead of
having like to go and read yourself.
Everyone really appreciates someone
to summarize in a nice executive kind
of bullet point type of interface.
So that being said, kind of evaluation
on this kind of type of expectations
probably is kind of makes sense.
The mode also struggling to figure out
what it is, especially as a few big
names out there that only we're offered.
Essentially on it, right?
Yeah, for sure.
Kush, there's one question I really
wanted to bring up around safety.
So, I agree.
I mean, I think like, interacting with
perplexity, I'm like, oh yeah, like, chat really
is like, giving me a lot of action on search.
Um, unfortunately, I think one of the
things that it's caused me to do the most
of recently is like, buy too many books.
Because I'm always like, oh, could
you give me some recommendations?
It's like, oh, of course.
And here's like ten fascinating books
about the thing that you're interested in.
Um, Can I, maybe I'll play skeptic for a moment
is, you know, I do think that one of the really
funny things about LLMs is that everybody
has rushed to use it as a search interface.
But like out of the box, LLMs are not
concerned with information retrieval.
They're not really concerned about facts.
They're not really concerned about,
you know, validation or verification.
There's no notion of like a page rank that
would even give kind of like, you know, a
sense of credibility between different sources.
So.
You know, there's almost, if I can play skeptic
for a moment, I'd love to hear kind of the
counter argument is no, you know, LLMs are
not the future of search because like LLMs do
something that's just so fundamentally different
from what you want out of search that like,
it's weird that we're in this weird situation.
We've got this technology, we're trying to
like bolt search like features onto this tech.
Isn't that a little bit like getting,
you know, the car before the horse?
Yeah, I mean, I think you're
absolutely right in many ways, right?
So, The fact is, really, it's the
RAG that's doing the search, right?
And then the language model is, um,
like on top of it, I mean, creating
the bullet points or whatever have you.
So, um, And, like, I'll
disagree with Kate a little bit.
I'm not convinced the chat is the best method
for, um, for doing search, necessarily,
or the best interface, because, um, like,
when I go, um, and I'm, like, searching for
stuff, um, like, whatever, like, a research
assistant goes to the library and, like,
is actually, like, trying to find stuff.
Um, the chat isn't, like,
all of the way there, right?
I mean, I think there's like, oh, you go down
one rabbit hole, or you come back, you look for
this, you go over here, go do this and that.
So it's not a linear process, which
is what chat kind of insists on.
And so, um, uh, intent based, um,
interaction is certainly part of it.
So chat is, I think, the simplest version
of, uh, kind of intent, uh, communication.
But, uh, I think there's more ways of
doing this that are Going to actually
emerge and that are more helpful.
So that's where the LLM strength will
be like, um, kind of how to organize the
work, um, organize these, um, sort of
different threads and putting them together.
The search itself, I think, is, um, uh, is
the retrieval part, and that is actually
already not part of how the LLM is doing it.
So, uh, I think that's where we might end up.
Yeah, that's super helpful.
I mean as almost like it's
almost two innovations really
that we're talking about, right?
Like one is the actual retrieval,
and then almost like the LLM is just
like the spice on top that makes it,
you know, um, more, more digestible.
I don't know, Kate, if you want
to kind of respond to that at all.
No, I mean, I don't disagree.
I think Chat is a huge improvement
on search compared to just, you know,
shouting into the void of a search
box, but is it the final frontier?
You know, I, I definitely think because
she bring up some really good points.
Um, you know, this isn't a quite a
linear flow on, you know, a lot of ways.
I think we've worked on, um Uh, making a
faster horse when we need a car, right, uh,
is the kind of old saying we've made chat,
uh, a really fast horse, but what does a car
invention look like in the, uh, search world?
So, you know, certainly
there's opportunities ahead.
Yeah. It'd be cool if it's like an
entirely different paradigm.
Like in some ways, uh, I do think about
the kind of anchoring effect of stuff like
stuff like chat GPT is like the only reason
we've taken this interface is because
there was this accidental thing where this
particular product became so successful
that we kind of see everything about LLMs.
through this lens, but it's like,
it's just almost like a historical
accident, um, in some ways.
It's really interesting to think about kind of
the different users like Kush as a researcher,
you know, you probably have a, a very well honed
art to how you investigate, uh, with Upmost
rigor on different topics where, you know, if
we talk about somebody who's just trying to,
you know, find, you know, what grocery stores
closest or, you know, more casual investigation,
you know, that's probably a different mode.
So I think there's tremendous diversity
as well in potential interfaces that
we're going to see for search, and there's
probably not going to be one size fits all.
Yeah, I would definitely agree with that.
I mean, when my kids are looking for
information, like, uh, uh, like how many
goals did, uh, whatever, like Alex Morgan's.
score throughout her career.
I mean, they don't need to do it in the same
rigorous way as I need to do my research.
So yeah, absolutely.
Yeah. And I think I, my, my long term theory
is that a little bit like how there's
like Google ease where people have just
like, not, they don't really speak in.
In English is kind of like a string of
words that they found to optimize the search
result, like we will also end up having
like a very even for these chat interfaces,
like our own perplexity ease, which will
like not be quite a conversation, but we'll
just be like how we've kind of learned
to get the best results out of the system
and ironically, as we do that, the
model providers are going to be
figuring out how to take that it.
Translate it into what they think is
optimal for the model and then feed that.
So there's just going to be layers
and layers of trying to find the
right way to frame a question.
And that is what agentic workflows are.
I mean, I think multiple layers of
agents translating from one thing
to another thing to another thing.
So, I mean, that's where we're headed.
Alright, for our final story, it's another
open source model story, but I think
it's a another interesting one to kind of
compare and contrast and kind of like talk
about the overall trend in open source.
Nvidia, maker of fine GPUs, um, has
come out recently with a fine tune
of llama that they call Nemotron,
specifically Nemotron 70B Instruct.
Um, and, uh, it was kind of widely touted by
Nvidia, they showed that they were able to
beat a bunch of state of the art benchmarks
across all the other proprietary models.
Um, I think that's all well and interesting.
But I think one thing I wanted to bring to
this panel was just to ask the question of why.
I know Nvidia from its GPUs, its
hardware, um, like why are they getting
into the model training business?
And why would they be open
sourcing models at all?
Um, Petros, I'm kind of curious,
you know, I'll just throw it to you.
Yeah, that's a very interesting, um, question.
Um, everyone knows Nvidia about its GPUs.
Um, I'm not sure, um, how many people don't
know that Nvidia is actually, Nvidia's
mode is actually, in my view, software.
It's actually the CUDA.
interfaces and drivers that essentially have
managed to attract developers over the course
of the last 10 years onto the NVIDIA hardware.
That's why everyone actually ended up using
NVIDIA and still using GPUs from NVIDIA.
So in that sense they do have a very kind
of strong mode in the form of software.
So it's only kind of natural to expect them to
Expand on that both in terms of the software
ecosystem that they build around their hardware
as well as of course showcasing this through
models that are able to train themselves.
And the last kind of thought on this
from my side is the fact that NVIDIA
is also developing its NVIDIA Cloud.
Which is also another kind of aspect
that contributes to the ecosystem of
AI models that NVIDIA essentially is
driving, for sure, from the hardware side.
So I guess, I mean, Cade, this sounds like,
I guess, from Petros's interpretation, it's
almost like just a, it's a show of strength.
Like, NVIDIA is just saying,
we can do models like this.
Um, but I guess part of this is
like they're trying to attract
people to their, their cloud, right?
Like I guess part of this is marketing
the, the cloud offering, which is true.
When I think of cloud, I think of,
you know, Google, I think of Amazon.
I don't really think of Nvidia.
Is that kind of how you read it as well?
That it's sort of kind of like trying to
promote that aspect of their business.
Yeah, you know, I think it's really a powerful
demonstration, right, of being able to say we
can take a model and we can customize it, we
can continue to train it, and we can continue
to boost performance beyond, in NVIDIA's
terms, what was originally released in the
CHAP version of Uh, instructor version rather
of the 70 billion llama model, but so in doing
so, I think they're trying to demonstrate right
that they have these capabilities and invite
customers to come and join in and be able to
customize their own models, continue to train
their own models all on NVIDIA's platform.
So it makes a lot of sense just as a
pure almost marketing point, right?
Being able to showcase their capabilities.
Yeah, definitely.
Kush, does do you, do you think this makes, um.
The other company is a little bit nervous.
I'm kind of thinking about like, you know, like,
NVIDIA has always been like in the background,
the chip people, they do infrastructure.
And then now this is almost like
what you're on our turf, right?
Like, what are you doing, releasing
something that competes with O1
or, you know, Opus or whatever?
Um, should, should the companies be nervous?
Like, is this kind of NVIDIA kind of
playing in a new playground in some ways?
Yeah, I think so.
And, uh, I mean, I think in the future what'll
happen is, um, just like in the traditional
machine learning, we, I mean, we have a problem.
We would look for a data set or collect a data
set and then go and build a model for that.
I think now in a couple of years, it's
going to be where models are the same.
You have a problem.
You go look for a model that's appropriate.
You go look for maybe some fine
tuning data that's appropriate.
And then you work with those.
You're not going to treat models as
anything other than artifacts that are Okay.
Part of the world of, uh,
possibilities for solving your problem.
And so I think the, um, way that NVIDIA can
kind of position themselves is that, um,
now, um, all these models are out there.
Um, you customer, you, uh, whatever
company, you don't have to like innovate
on, on, on those pre trained models.
But what you do need to do is the customization.
And I mean, that's the,
what's next story as well.
But, um, uh, the customization, uh, come to us.
I mean, we're going to be the ones who help you.
And, uh, and I think just having that
mindset available that you don't have
to worry about, uh, all these, uh, these
different models just on the customization.
I think that's the, uh, the part
that'll, uh, kind of be their strength.
Another thing I'll kind of just say is, uh,
I mean, there's, there's this overused trope.
I mean, that in the gold rush, the people that
made money were the ones who, um, I provided
the shovels or whatever, or the blue jeans,
um, so I think with the blue jeans, I think
the thing is, I mean, they somehow crossed
over from just being something for minors
to being like a high fashion sort of item
people would customize, um, and so forth.
So I think it's, I mean, somehow
moving from that commodity to the, to
the fashion as well in some capacity.
Yeah, that's right.
Yeah. It's the, it's the high prestige,
you know, salvage denim jeans.
Yeah, that's right.
Um, Yeah, I think that final comment is so
interesting, too, because it kind of suggests
the ways in which, I guess, some of these
visions are aligned, particularly between,
like, say, IBM and NVIDIA, where sort of
NVIDIA is like, well, so long as there's
more demand for models, we're excited,
because it all takes place on chips, right?
Uh, and then IBM in some ways is like, we
want to release all these models to kind of
unleash all these developers, but we also
believe there's going to be, like, a lot of
enterprise services we'll do on top of it.
Um, and I actually don't
even understand, is it right?
I think Granite's going to be
available on NVIDIA as well.
It is.
They were a launch partner.
You can check out the Granite 3.0 models on NVIDIA today.
And even the Granite Guardian within 12
hours, they had it up as a full working demo.
So you can try the Granite Guardian there too.
Yeah, it's so fast.
Well, great.
Well, that's all the time we have for today.
Um, Kate Cush, always great to see you.
Um, thanks for taking the time to talk about
Granite and Petros hopes, Petros, so we
have you on the show again in the future.
Thank you very much for having me.
Well listeners, if you enjoyed what you
heard, you can get us on Apple Podcasts,
Spotify, and podcast platforms everywhere.
And we'll see you next week for another
action packed week of Mixture of Experts.