Llama 3.1 Debut, GPT‑4o Mini, AI Price War
Key Points
- Meta released Llama 3.1, the first high‑performance frontier AI model made openly available, sparking excitement about community‑driven model building, business opportunities, and AI‑safety considerations.
- OpenAI followed with GPT‑4o mini, a tiny, ultra‑cheap model that intensifies the emerging “frontier model” price war and raises questions about the long‑term sustainability of rapid, low‑cost AI launches.
- The panel highlighted a key technical gap: while OpenAI’s offerings are primarily cloud‑based APIs, the demand for truly embedded, on‑device models remains unsolved, though the company may address it in the future.
- Hosts introduced a diverse expert lineup—Barsey (senior partner, consulting AI), Chris Haye (distinguished engineer, CTO of customer transformation), and newcomer Mariam Ashuri (director of product management at Watson X)—to dissect the week’s AI developments.
- A light‑hearted moment noted Mark Zuckerberg’s new, “surf‑style” public appearance alongside the Llama 3.1 announcement, with unanimous panel preference for the fresh look over his classic nerd‑hoodie image.
Full Transcript
# Llama 3.1 Debut, GPT‑4o Mini, AI Price War **Source:** [https://www.youtube.com/watch?v=bQzPaRYC9BE](https://www.youtube.com/watch?v=bQzPaRYC9BE) **Duration:** 00:20:20 ## Summary - Meta released Llama 3.1, the first high‑performance frontier AI model made openly available, sparking excitement about community‑driven model building, business opportunities, and AI‑safety considerations. - OpenAI followed with GPT‑4o mini, a tiny, ultra‑cheap model that intensifies the emerging “frontier model” price war and raises questions about the long‑term sustainability of rapid, low‑cost AI launches. - The panel highlighted a key technical gap: while OpenAI’s offerings are primarily cloud‑based APIs, the demand for truly embedded, on‑device models remains unsolved, though the company may address it in the future. - Hosts introduced a diverse expert lineup—Barsey (senior partner, consulting AI), Chris Haye (distinguished engineer, CTO of customer transformation), and newcomer Mariam Ashuri (director of product management at Watson X)—to dissect the week’s AI developments. - A light‑hearted moment noted Mark Zuckerberg’s new, “surf‑style” public appearance alongside the Llama 3.1 announcement, with unanimous panel preference for the fresh look over his classic nerd‑hoodie image. ## Sections - [00:00:00](https://www.youtube.com/watch?v=bQzPaRYC9BE&t=0s) **AI News: Llama 3.1 & GPT‑4o Mini** - The host introduces a panel to discuss Meta’s Llama 3.1 release and OpenAI’s cheap GPT‑4o mini, exploring open‑source impacts, business implications, and the sustainability of the frontier model price war. ## Full Transcript
hello and happy Friday this is mixture
of experts and I am your host Tim Juan
each week mixture of experts brings
together a world-class panel of
technologists Engineers and more to help
make sense of the title wave of news
each week in AI this week on the show we
cover two big stories first metast
strikes back with the launch of llama
3.1 and Zuck is out with a brand new
look we talk about the state-of-the-art
and language models in open source and
we talk about the implications on the
business of AI and on AI safety this is
going to be game changer for the market
because it's enabling the open source
Community to start building using a very
powerful model that is available to them
to create to smaller models and put it
back to the market second open AI
continues its string of launches with
GPT 40 mini a relatively tiny wildly
cheap model we talk about the ongoing
Frontier Model price war and how
sustainable it is over the long run
Chris did I hear you say embedded models
open AI on device that's that's the use
case that I can't solve open AI is an
API I think they'll get
there as always I'm joined by an
incredible group of panelists that will
help us navigate what has been another
action-packed week in AI so today we've
got three panelists we've got barsy
senior partner Consulting AI for US
Canada and ladam we've got Chris Haye
distinguished engineer and is the CTO of
customer transformation and finally
joining us for the first time is Mariam
ashuri director of product management at
Watson x.
[Music]
so our first story today of course is
the launch of llama 3.1 um this is
obviously an enormous uh technical
Milestone it's the first time arguably
that we've had Frontier AI models that
are available in the open source and
we're going to talk all about the
technical aspects of this and why you
should care as a listener um I think to
just get us started though I think one
of the things that uh I also loved about
the announcement is that Mark Zuckerberg
person Al took to Facebook to announce
this and he did not only debut llama 3.1
but he also debuted like a new look um
if you remember kind of classic version
Mark Zuckerberg like very pale very
serious very nerdy looking you know with
the hoodie um but he's been looking real
like kind of like surf beach guy you
know is kind of his new look um so I
think just to start I want to ask each
of the panelists uh do you like the old
Zuck look or do you like the new Zuck
look better uh Chris to you first old
Zuck new Zuck new Zuck
goit NE is the new
cool and last but not least Mario what
do you think old Zuck new Zuck I'd go
for new okay so we have bold consensus
on the new Zuck I think we're gonna come
to miss the old Zuck I I really like old
nerdy Zuck um but uh but again you know
seasons change and we have to go with it
so let me just introduce the story of
today um if you haven't been watching
the news or if you're even remotely
related with AI I think you'll know that
meta has come out and launched llama 3.1
um it is the latest edition of its llama
class of models um and uniquely meta in
comparison to say open AI or anthropic
really is kind of pursuing open- Source
AI in a very big way and I think one
really big thing that we've seen is the
launch of the gigantic sort of llama
405b model which is a highly capable
model state-of-the-art and now just
available for free uh in the open source
and I think the first person I want kind
of turned to was uh Mariam you were
working on this on day one of the launch
for Watson X and I'd love to just like
for you to talk a little bit about that
right because like it was really easy
right you just kind of like threw it up
and you know people downloaded it I'm
kidding of course like I'm kind curious
just like get your War story of like
what that felt like to launch a model
like that and there's anything like you
learned or walked away with from that
experience it was actually nothing that
I call Easy especially because it's a
giant model so we had to run it cross
noes mult multide inferencing this was
the first time that we had such a 400
billion parameter models on our platform
but um it was exciting and I'm super
excited not just for our customers but
also for the community I think the
amazing thing that meta did yesterday
with llama 3 llama
3.51 40 five billion the name is getting
longer and longer yeah they really need
to work on The Branding for all these
I don't even know what's going on um
it's it's changing the license to let
the market to use the model for
distillation and teaching the smaller
models this is going to be game changer
for the market because it's enabling the
open source Community to start building
using a very powerful model that is
available to them to create a smaller
models and put it back to the market so
for that reason I'm super excited for
the opportunity it unlocks
yeah I think I'm I'm thrilled by it just
because I think for a long time open
source you know has been very exciting
but arguably has not been like it's been
lagging a little bit in performance and
now a suddenly there's like what looks
to be the possibility that the open
source is going to be just as powerful
just as exciting just as
state-of-the-art in a lot of ways can I
just ask a question on behalf of like
the listeners who may be less familiar
with the space which is like why is meta
doing this it's like incredibly
expensive to build one of these models
and if I have it right they're like
literally just giving it away for free
which you know I'm I'm not a big city
business guy but like I don't even know
why that works I don't know show you
want to comment on like what is going on
here like why is meta doing this and do
we think they could make money on this
they're losing money aren't they yeah
absolutely so I think we there are
certain clients certain vendors like
meta or Nvidia that have other sources
of revenue right what they sell by
selling the bottel is going to be a
rounding error as compared to what NV is
going to do with Hardware so they're
giving away the nimron with meta they
all these other social properties that
they make money on so just give you a
quick data point on meta
itself you on the chief AI off scientist
at meta absolutely incredible
personality he was sharing a data point
on
before two years back when somebody's
posting something on Facebook and you're
trying to figure out if there's
misinformation or hate crime abuse and
things of that nature the NLP Best in
Class would give you about 24 25% hit on
identifying that as some bad content now
with the Llama models they're getting
close to 92 94% so they're able to do a
lot more good and filtering with their
models and they're able to to use that
to enhance their own products right when
you're experiencing Instagram or
Whatsapp or or Facebook they're
embedding AI that is now being helped by
the entire crowdsourcing of everybody
else right so they have different
Avenues of Revenue so for them this is
not a lost leader product so to say
they're trying to figure out a right
Community around it can build towards it
contribute and they are benefiting in
their products uh by using this AI so in
the very very small subset of of vendors
who can potentially do this Google AWS
Asher IBM we're all in the business of
selling the AI to other businesses right
so you won't come to a point where you
can just open up your your models and
then IBM comes in and says you know what
mic drop I'm going to open source my
granite models as well we want the
community to come and help so it's it's
a very it's a this Market is about split
between companies like meta and IBM who
are opening up the models completely
versus the clo Source models yeah
definitely and I think part of the
question that really is in my mind is
you know the kind of pressure this
creates for the close Source models
right because I think on some level it's
kind of like look if you're open AI
you're like we've got this like crazy
you know machine intelligence and we're
going to rent you access to it right and
this is in some ways kind of flips the
whole thing on its head right it
basically says look access to that is
going to be free um and I guess Mario
I'm curious to someone who's kind of in
the space like do you think that this is
ultimately going to force like an open
AI or an anthropic to also have to go
open source in the end because it feels
like once it's free you know why would
you pay for Claude or something like
that well look what mol did yesterday
with mistol large too they put it out
the weights for research only this is
their Flagship model we've been having a
lot of conversation in the past about
protecting the weight to make sure that
it doesn't go out and here it is it's
out for research only coh here did
something similar a few months ago so
I'd see the trend where the trend is
going is to nurture that openness but
reserve the rights for commercial
purposes I think the final item we might
want to touch on before we jump here is
Mariam you kind of pointed out M trial
was kind of like moving in a very
similar Direction here in terms of their
kind of like licensing and research
presumably part of that research
licensing is also to like say Hey you
can red team this model you can make it
better you can Surface all the safety
issues um I'm curious about like how you
see kind of platforms like uh Mr kind of
like f falling into this ecosystem right
because they are not one of the really
big corporate players right um and um
but they still seem to be really kind of
like being able to kind of ride this
open source wave um curious if kind of
thoughts on like how they'll fit into
the competition here as it continues to
evolve I think it's important to think
about what problem each of these model
providers are solving if you look into
mol they are EUR born um favorite of the
Europe so that's that's the market they
are supporting a wide range of European
languages not a specific dialect but a
wide range over there so if you are in
Europe if you're speaking that language
there is a way higher chance that mol is
a better positions model for you um so I
think it's important to understand what
the use case is who is going to be using
it and then what is the right model for
that Target use case versus generalizing
of hey is Mr Large better than
[Music]
anthropic so I'm going to move us on to
the second story that we're going to
cover today we could obviously be
talking a lot more about this but I
think almost we're going to flip to the
other big Dimension um that we see kind
of evolving in the AI market so one of
them is from close to open right which I
think is definitely a big shift and 3.1
really just puts a big meta flag you
know on that on that change I think the
other big change that we've been
tracking and we've been talking about in
the last few episodes but I want to hit
on it really hard is basically the
movement from gigantic models to very
little models right very fast very cheap
um and and smaller models um and the
kind of Peg for this is just last week
open AI announced their latest Salvo in
this battle um which is a model called
gp2 uh GPT 40 mini to maram your earlier
point about like they really got to
improve the name on these but that's
what they released and what's so
striking about this announcement is that
the pricing is like legitimately crazy
it's like 15 cents per 1 million input
tokens 60 cents per 1 million output
tokens and they point out in their blog
post that since 20 uh 2022 the cost per
token have dropped
99% um and I think the first question I
just want to launch with and Chris maybe
you're well positioned to answer this is
are we just in a price War here like is
this even sustainable like I'm kind of
curious about like how much of this is
open AI really being able to cut the
costs of serving low enough that they
can offer these models still out of
profit versus them just really kind of
in this big like just pushing the price
to zero battle against their Rivals
because it kind of feels like this is
also part of the open source competition
as well right is like how can we offer
free to keep up with all the other free
options that are happening out in the
world and so I mean I guess Chris to you
kind of like do you think we're in a
price War I think we are a little bit in
a price War I think open AI has some
other considerations as well because
although though we've spent all of our
time playing with GPT 4 GPT 40 I think
the reality is the vast majority of
people are running GPT 3.5 which is a
bigger model and realistically open AI
had to remove their free model and run a
cheaper model and actually that's kind
of what they've done there with GPT 40
mini so yeah wonderful here we go we're
so great here's a cheaper model Etc but
actually they managed to get all of the
very large G bt35 off of their books and
now they're running a uh a smaller model
to be able to serve the majority of the
requests that are hitting chat gbt so I
think they needed to do that just for
kind of commercial reasons I think we
are pushing towards smaller models all
the time the only time you really need
the larger models is for reasoning and
planning right for the smaller models
most of the time with good fine tuning
you can get the model to do what you
want and I think open AI realized that
as well the world realizes that open AI
has already seen that a lot of people
start with GPT 4 as or GPT 40 their
starter model when they're in the
Enterprise but very quickly when they go
to production they bring it down to a
smaller model and that's been eating
their lunch and they wanted to stop that
as well so and then as we move to
devices embedded devices they need to be
able to play in that space so it's
critically important for open a to have
that smaller model in the market space
um so I think that's great but is it a
price War partially but it's also a cred
War as well Chris did I hear you say
embedded models open AI on device that's
that's the use case I can't solve open
AI is an API I think they'll get there
if you really think about this for a
second so if we take a guess at what
size the OM Mini model is it's probably
around the 11 billion parameter Mark
right maybe a little bit more maybe a
little bit less there is going to come a
point when you start dealing with apple
when you start dealing with uh Google
Etc you are going to have to provide a
model at some point to run on a device
and if you don't you're going to be
locked out of a market we've already
seen that with the iPhone uh and their
recent announcements so they are going
to have to do something in that space I
think this is a move towards that no
doubt so they're not offering embedded
just now but they will in the future
yeah I think that definitely is going to
happen I think also another thing in
Chris and what you're saying is do we
have too much intelligence now you know
this is kind of where you're pointing
towards right and Mariam I don't know if
you agree as kind of someone who's
working on Watson X is like you know
it's almost like there's some level
always like oh why don't you want the
bigger and better model it can do so
many more things but I guess kind of
Chris you're making the argument that
like maybe we don't really need those
things um like there's almost basically
like we've now passed the point our
models are so capable now that like
actually they're past the point of like
what we actually need on an everyday
basis well think about it the larger the
model the more more powerful it is but
also the higher uh the larger compute
resources it needs it translates to
latency that's response time if your
Enterprise you want to use it in
production that translates to carbon
footprint and energy consumption that is
the topic of conversation these days and
that translates to cost so cost is
actually just one of the factors so in
some highly regulated environments it
might be even the the other two might be
bigger blocker uh to move forward but
the comment that you made about the
price I I feel like there are two fold
to this if you are a model provider you
want to set the price as much as you can
to increase the adoption if you are a
consumer of those we see half of the
market has already moved from
exploration to pilots and 10% to
production that when you get to a scale
the the cost adds up like if you think
about that for normal prediction use
cases you might be having like 500,000
predictions a day of if you're a door
Dash or I used to work for l
so in those environments if you want to
use jna just do the calculation price
per API call it adds up so it's really
not sustainable so in order to get it to
production and scale model provider has
to like find a way to set the price low
and the way that we can implement it is
usually through smaller moving through
smaller um platform so it's it's sort of
driving the de the demand is driving
where the market is going but also this
is is the right thing for the whole
Market to do and maram just to add to
that right so this week as well open AI
added the ability to find tune their
Mini model and I have to think these two
things are related right because when
you go to production as you say maram
one of the things you're going to want
to do to cut bring that inference down
and improve the reliability is take that
model and fine-tune it with your data
rather than having large prompts yeah
see in the market emerging for
Enterprise is grabbing a much a
smaller trusted models I would say and
fine-tune it on their proprietary data
the data about their users and the data
about the domain because at the end of
the day they want to have something
differentiated in the market because
these large models everyone has access
to that and the power of differentiation
is really the proprietary data in order
to do that you should be able to fine
tune it with the data that no one else
has access to yeah Mariam and I we were
on a call with a client yesterday and we
got with this nice argument he just
started off by saying he's a head of AI
for a big big Fortune company and he
started off by saying that hey I expect
these models to be intelligent so I
don't like the small ones I really want
big large ones so they can actually do
something meaningful for me and we had a
nice chat with him explaining him how
this whole pricing and stuff like that
work let's take an example for just the
pricing part maram and this is something
that you and I do quite a bit on excels
trying to showcase the the range right
and your favorite example is let's take
a 30 minute uh recording of a of of a
call and let's summarize it into one
page right and if if you look at some of
the tokens make some assumptions around
that let's do this a thousand times
right a thousand times you're
summarizing a call transcript into a
page with the 40 Mini model that is a
dollar and if you look at the
best-in-class models from Claude and 40
that is about $30 to $40 right so you
get us quite a bit of a range and then
you bring in the Llama model the biggest
one and the 4 and five billion pret
model it's going to be 80 bucks so now
you have an open-source model that's
costing $80 for those thousand
summarizations you have the
best-in-class Frontiers that are half of
that at 40 bucks and then you have a
dollar if you're using the 40 mini even
hosting your own models if you're
hosting a llama 8 model which is much
smaller than the 40 mini hosting your
own model on AWS Azure IBM Google of the
world that's going to be like $34 for
you so now you have just look at the
price points a dollar openi mini you
don't have to have any headache on
what's Happening they're giving you all
kinds of indemnifications they'll give
you ways of find tuning it make it your
own a dollar $34 is the free really
really small llama 388 bilon parameter
models then there's $40 if you're doing
the best in class and there's $80 if
you're doing the biggest llama model
open source just it becomes very real
when you start doing a million of these
a day so we have to wrap up uh we're
almost at time of course as always we
could spend a lot more time talking
about this I think one of the most
interesting things coming out of this
conversation is maybe it becomes worth
less and less to train bigger and bigger
models and so here's my spicy take I
want to end with with a yes or no
question which is at some point in the
future will open AI stop training larger
and larger models and just focus on the
models they have
Chris open AI is going to build a model
that's powered by the sun in the
future got it show it open a will keep
going at it there's a lot more to be
done to get to human
intelligence and Mariam the regulations
is going to stop that at some point wow
okay so at some point open a will stop
um well a lot more to get into there
Mariam we'll just have to have you back
on the show at some point regrettably uh
but I hope you had a good time Chris
show it again thanks for joining us and
uh for all you listeners out there
thanks for joining us as well if you
enjoyed what you heard you can get us on
Apple podcast Spotify and podcast
platforms everywhere and we'll see you
next week