Open Source AI, RAG, and KANs
Key Points
- The “Mixure Experts” podcast brings together AI researchers, product leaders, engineers, and policy experts each week to dissect the biggest AI news, starting with three focus topics: open‑source model trends, the future of Retrieval‑Augmented Generation (RAG), and the hype around KAN (Kolmogorov‑Arnold Network) models.
- Recent open‑source breakthroughs were highlighted, including Meta’s Llama 3, Apple’s on‑device model release, and IBM’s new Granite family, underscoring a rapid expansion of publicly available, high‑capacity AI models.
- IBM’s Granite models (3 B, 8 B, 20 B, and 34 B parameters) were announced as open source, trained on 116 programming languages, and positioned for enterprise use with capabilities that go beyond typical Python‑centric code generation.
- The panel previewed the next evolution of RAG, discussing how retrieval‑augmented techniques have matured and what breakthroughs or challenges may define their future impact on AI applications.
- KAN networks were introduced as the latest buzzword, with the experts weighing their theoretical promise, current hype, and whether organizations should invest in the technology now or wait for further validation.
Full Transcript
# Open Source AI, RAG, and KANs **Source:** [https://www.youtube.com/watch?v=K83tTEeGCBc](https://www.youtube.com/watch?v=K83tTEeGCBc) **Duration:** 00:46:18 ## Summary - The “Mixure Experts” podcast brings together AI researchers, product leaders, engineers, and policy experts each week to dissect the biggest AI news, starting with three focus topics: open‑source model trends, the future of Retrieval‑Augmented Generation (RAG), and the hype around KAN (Kolmogorov‑Arnold Network) models. - Recent open‑source breakthroughs were highlighted, including Meta’s Llama 3, Apple’s on‑device model release, and IBM’s new Granite family, underscoring a rapid expansion of publicly available, high‑capacity AI models. - IBM’s Granite models (3 B, 8 B, 20 B, and 34 B parameters) were announced as open source, trained on 116 programming languages, and positioned for enterprise use with capabilities that go beyond typical Python‑centric code generation. - The panel previewed the next evolution of RAG, discussing how retrieval‑augmented techniques have matured and what breakthroughs or challenges may define their future impact on AI applications. - KAN networks were introduced as the latest buzzword, with the experts weighing their theoretical promise, current hype, and whether organizations should invest in the technology now or wait for further validation. ## Sections - [00:00:00](https://www.youtube.com/watch?v=K83tTEeGCBc&t=0s) **AI Trends Panel Kickoff** - Tim Hong launches the Mixture of Experts podcast by outlining three AI storylines—open‑source model trends, the future of retrieval‑augmented generation, and the buzz around Kaveri/Arnold networks—and introduces the IBM‑MIT expert panel. ## Full Transcript
[Music]
hello and welcome to mixure experts I'm
your host Tim Hong each week we bring
together a panel of researchers product
leaders Engineers policy experts and
more to discuss debates and distill down
the week's biggest news and Trends in AI
so today on the show three stories first
one the state of the open source uh what
are the biggest Trends in open source
models and how will they shape the
business of AI second the future of
retrieval augmented or rag uh they've
come so far where are they going to go
next and then finally kav Arnold
networks or can what the hell are they
why are all the Nerds suddenly talking
about it and should we buy the hype so
today on the show I'm ay supported by an
incredible panel of experts so uh first
off Marina denki senior research
scientist at IBM Marina thanks for
joining good to be here yeah and
particularly thanks to you for joining
us so early Pacific Time David Cox uh VP
models and director of the MIT IBM
Watson lab David thanks for joining the
show pleasure to be
here and uh returning for the second
episode we were joking that we just made
make this a Kush vars show going
forwards uh he's unfortunately declined
that but Kush Varney IBM fellow working
on issues surrounding AI governance Kush
welcome back it's great to be here and
uh yeah I'm the VY with the hair um so
yeah we'll use that as a little
pneumonic so
well great so let's start with the first
story that we want to cover today on
mixture of experts um so I think from
where I'm sitting uh you know there has
been just so much happening in the world
of Open Source right So Meta of course
released llama 3 a few weeks back um
Apple in a very you know big move for
them I think released the open um uh Elm
on device models and then IBM just
recently released its Granite family of
models uh and so David I kind of want to
give you a chance to kind of first plug
Granite tell us what it is and what you
guys have been working on um and then I
kind of want you to kind of go into you
know why it is that IBM decided to
release Granite open source and why it
thinks that doing this matters and I
think from there I think we can talk
more broadly about what's happening in
open source but I wanted to give you a
shot to talk a little bit about the work
that you and the team have been doing
sure yeah happy to um we actually had
two major open source announcements this
was a big week for us across IBM and red
hat uh the first first uh was that we
open sourced the granite code family of
models so these are models in a a
variety of sizes 3 8 20 and 34 billion
parameters trained on 116 programming
languages these are um you know
state-of-the-art models competitive with
you know the best in in the field and
one of the areas that we really
optimized for for Enterprise users
because ultimately IBM is interested in
supporting Enterprise is that allaround
capability you know not just Python and
gener code generation which is often the
focus uh for the academic Community but
also Java and rust and all kinds of
other languages and also things like uh
code fixing and explaining um so there's
a lot of things you can do with code
models and it's really being integrated
into the software development you know
fabric of how we do software and
software is integrated into the fabric
of everything we do in society and we we
wanted to release these because we you
know ultimately our position is that
that open winds you know like we're
communities will build around these
models people will build things um that
you know that we wouldn't expect they'll
they'll be able to extend the models and
that's that's super powerful and and
that leads a little bit to the second um
announcement which which happened
through red hat so we have a technology
that we developed for doing alignment of
models uh we call large scale alignment
for chat Bots and that gave rise to a
project called instruct lab and what
instruct lab is is a way to actually
aggregate Community contributions to the
instruction tuning of a model so now uh
any developer anywhere in the world can
submit new skills and new knowledge to a
model and and then that actually gets
integrated and then we do a weekly build
of that model so it's a different cycle
of development a different kind of
community forming where we're not just
forming around a model and you know
building inference tools and things like
that but we're actually able to merge
all those contributions and then update
the model every week uh so we're really
excited about this it's been a fantastic
partnership with red hat building this
out they know open source better than
than anyone and uh we're really excited
this got announced uh at at Summit on on
Tuesday by Matt hix the CEO of Red Hat
yeah that's awesome and I think I don't
know if You' agree with this is like I
see what IBM has done with granite and
with instruct lab and it's kind of like
you I was joking with a friend the other
day I was like it's open source putting
its big boy pants on right like they're
kind of like moving into like open
source being something that like
Enterprises will actually use um and I
think that's really changing what we
mean by open source right like I think
like the big Trend even a few months ago
was like oh my God these open source
models are just getting so big right
like they're huge parameter models and
like isn't that the exciting thing is
that open source models will be you know
on par as sophisticated as like
state-of-the-art but what's kind of
interesting here I think is really sort
of like twofold right like I think like
what's interesting with granite is you
guys are releasing a class of models of
different sizes sort of on the idea that
like not everybody's going to need like
the the chunkiest model in the whole
world uh which I think is really really
interesting um and then I think again
it's also kind of like on par with what
we see out of the Apple announcement the
open Elm announcement right which is
like these are not the biggest models
but they're on device models right and
it kind of feels like I don't know if
you'd agree with this David is like it
feels like open source is finally now
kind of like responding to Market need
like then in some ways like Enterprises
are like how do we actually apply this
stuff and now like essentially open
source Community is like trying to now
you know adapt to actually provide
solutions to that but I don't know if
that's like a characterization you guys
would agree with yeah yeah no and and I
think I think you're spoton you know
there isn't just one thing that people
want to do with llms so there's not
going to be just one llm that wins the
day um you know we have our models
running on laptops and there are that's
they're really
interesting uh you know advantages to
doing that like if you if you want to be
on Prem you don't want to be data over
the network it's it's uh you know
there's you know it's proprietary and
you're worried about IP you can run
these models in many cases on on your
laptop for other applications you want
the very best performance you know say
you're doing an application
modernization that's just going to
happen once um and it needs to be you
know the highest quality then you can
move to one of the larger models um so
we we really are trying to be responsive
across the spectrum of of different
needs and yeah at IBM we we are trying
to be sort of I think you said the the
big boy pants you know like we're very
transparent about what data we put in
the models which is which is not always
true but which is very important if
you're an Enterprise want to use these
uh the other point of differentiation
for our models is we release them under
Apache 2 license just a clean uh pachy
to no additional restrictions and this
can be really important for for adoption
we knew this is something that our
customers uh would ultimately need and
want so um that that that's you know how
we're evolving uh sort of the our
approach um to open source and and again
yeah like you said meeting the customer
needs yeah definitely and chis Marine
I'm not sure if you've got views on this
is like I think you know I think we can
use this as a springboard to kind of
talk about like how this is going to
sort of shape the the market as a whole
right because I think you know if I'm
now you like an Enterprise sort of
thinking about how to integrate llms
right it feels like there's increasingly
options right well we can you know go
work with we can try to do it at all
ourselves in house right like we can try
to go with like the big proprietary
models right um and it kind of also
feels like there's going to be a range
of new businesses that emerge here as
well like just like the whole business
of like you come to us with a problem
and we fine-tune open source models for
you seems like it'll increasingly become
a big part of the ecosystem but um yeah
I'm kind of curious as from from your
point of view kind of like in you know
the the kind of research space and even
thinking about like you know where this
all goes just if you've got views on on
how this will kind of impact the
ecosystem as a
whole yeah I mean uh one great thing
that uh I mean instruct lab enables is
really I mean shifting power to Value
creators so um it uh really allows uh I
mean as David said I mean this whole
Community to uh to really congeal around
this thing um and uh make these models
authentic for themselves it's some sort
of uh commitment to locality as well I
mean for whatever you need uh for your
Enterprise for your organization you can
um uh really make things uh make things
yours so I think it's uh it's an awesome
awesome thing yeah I really appreciate
being able to add the skills as you find
them needed for your own use case so the
thing with all of these models is that
it's very hard to predict when you put
them out what are they actually going to
be used for so being able to have the
flexibility to say oh I've realized I
have a use case I need to adapt quickly
I need to make the model adapt quickly
sometimes with something that's
proprietary or somewhere else you just
don't have the ability to move that
quickly or even to stress test or check
and is this going somewhere or is this
not going to be helpful at all so from
that perspective actually the way that
uh We've released instru lab is very is
very good it's very effective for
checking these cases it it's very rare
that um any given company or Enterprises
needs would be represented would be a
top of mind for the developer of of Any
Given base Foundation model like does
does meta you know care about an
insurance company well you know they
probably do but not it's not their not
their primary uh con thinking about
just being like I wonder what AIG thinks
about this exactly exactly so so having
a base that's built for Enterprise but
then giving the ability to customize and
really focus and and bring in you know
knowledge and and and particular things
you want to do that are specific to that
industry uh can be really powerful can I
so we have a few more minutes on this
topic can I play Jerk for a second right
because I do think that like you know
one of the most interesting things about
open source is that early on you know if
you were if you were a government right
or someone worried about AI ethics or AI
safety right you basically say well the
rise of these few leading companies with
proprietary models is like really good
for us right because we only have to go
to a few companies and change their
policies in order to sort of secure the
ecosystem right and I think you might
say well one of the issues of these
increasing proliferation of Open Source
models right and the fact that
everybody's kind of going to be running
their models on premises right is that
there's a lot more room for people to
misuse these models um and also like you
might think that also they create all of
these supply chain security issues as
well like I'm kind of thinking about how
like uh mpm right like other instances
in which open source is really taken off
um you know security ends up being this
really big problem because like the
provenance of any particular component
is really difficult and your stack might
rely on you know hundreds of Open Source
components and I guess I'm kind of
curious I mean I don't think anyone's
got a good solution to this and and look
I came up as like a free software
advocate so like I I'm on I'm on the
side of what's going on here but I'd
love the kind of panelist of like you
know offer an opinion about that like do
you buy that those are risks I don't
know if there's kind of smart Solutions
you guys are thinking about just to kind
of wrestle with that a little bit I
think is one of the most interesting
parts of this development yeah one thing
just to start off um on the security
issue um history has proven in open
source software that open source
ultimately ends up being safer not less
safe their efforts for instance to
create you know private versions of the
Linux kernel and it it turns out it's
just hard to keep those safe because
more eyes mean uh you know more more
sort of you know uh people who can find
uh you know problems understand problems
and and fix them um so I think having
that transparency enabling the academic
Community to get involved to build
Solutions uh for many problems that we
may face I think is super important I
will also say we're very careful about
what we um what we release I mean we're
we're we're very careful about what data
goes into these models uh before we
release them ensuring that they're you
know minimizing the risks uh any po
risks around you know you know
potentially dangerous you know
activities where we're not releasing
models that we think are could be used
for for for ill intent of course not
yeah and I think uh I mean I do think
that there's going to be a need almost
for like a consumer reports or a wire
cutter for these models at some point
where it's basically like there's going
to be so many models out there that it's
going to literally be like well we had a
couple experts spend like a few hours
really testing this thing you know and
this is like an important part of the
the ecosystem Kush it looks like you
might want to get in yeah I mean uh we
actually do work on exactly that the
consumer report sort of idea so we call
it uh AI fact sheets um and model risk
assessment and uh it is uh exactly a way
to uh to analyze uh these different
models that are out there um give them
different scores along different
dimensions um and as a consumer um you
can I mean really look at different
vendors different sort of options and uh
get a good sense of uh of what's
available so this is actually um
something already available through
through Watson x. governance one of our
Flagship products yeah I imagine it's
come some kind of future when I quit my
job as a podcast host to be a like a
model Sali you it's just like have you
considered like this this model for your
use case fine vintage that's awesome
yeah a fine vintage yeah exactly right
yeah really good oky overtones on this
2024 was a good year for llms yeah
exactly um Mar any final thoughts before
we move to the next topic here yeah I
would say that it's uh still very early
days also with this technology and
everything that we're going into so
especially as scientists we would like
to try not to have the hubris of
thinking yeah we've got this you know
leave it with us we've we've sorted out
the rest of this there's been so many
interesting developments and surprises
in this technology in the last few years
and we we think that will continue to be
for sure in that sense open source is
actually going to be more efficient even
from a market standpoint more eyes means
more ideas means more places that this
is going to develop in unexpected and
interesting ways so it's actually even I
think more efficient besides whatever
thoughts we may have about the morality
of it as well yeah no for sure and again
I'm kind of arguing against myself
because like I'm very Pro open source um
I think it's just like a very
interesting kind of set of
considerations as like the whole
architecture of the industry sort of
shapes uh and
[Music]
changes well this is great so let's move
to the second topic today I really want
to talk about retrieval augmented
generation or rag um so if you're not
familiar with this rag is uh basically
one of the hotness uh in in in AI um if
you look at the papers that I clear this
year or ACL um there are a lot of papers
using rag methods um and you know I
guess Marina I you keep me honest here I
mean I think one of the reasons that it
has been so prolific and of so much
interest is that rag seems to kind of
open a window for solving a lot of the
models the problems that we have with
language models right like well we can't
train these models pre-train these
models all the time but if they're
really good at pulling data from
elsewhere um you know this is a good way
of keeping their responses up to date um
it's a good way of ensuring that they're
you know more factual potentially um and
um and so I I'm curious because I know
your group recently released a paper um
thinking about and using Rag and so
maybe as a springboard for the
conversation I don't know if you want to
quickly talk about that and then we can
kind of more generally talk about you
know I guess from your point of view
what you see as sort of the existing
limitations of rag and what are the Big
Technical problems that need to be
solved sure that sounds great so um the
paper that you refer to it's a
description of a methodology and a
system for trying to evaluate it more
deeply again the point of rag is it's
one to be able to have a conversation
with an llm in which you ask it to write
a hiu about frogs they're great at that
no problem we he we live at business use
cases and so it's very important that
when you have business use cases that
rely on factual information and it's
really a problem if you get things wrong
this is where you get into rag like you
said being able to point to a reference
of all right the reason I'm giving you
this answer is because this is the
content that I am relying on whether
it's informational or it comes from a
knowledge base whatever then you want to
actually go and double check is this
going to act the way that I expect it to
act and it's one thing again to uh test
these llm models against large
benchmarks there was some good comments
last week about benchmarks and the use
you know usefulness of them as time goes
on it's another thing to actually see
what happens in a customer's use case
this is an old data analysis uh
necessity you have to go into okay what
when to the test cases that you've
created your testing where did your data
come from what are the documents how
have you managed to without knowing it
introduce biases into the evaluation
that you're doing because of the way
your annotations are done because of the
way you defined your metrics because
people have different understandings of
what is acceptable what is not you have
over uh corrected for a particular query
type you have cor over corrected for a
particular way of responding this is all
uh analysis that you need to do to have
confidence in the solution you put out
that includes an llm but is not just the
llm by itself it's the llm as a part of
a solution and so that's something that
my group is does a lot I know kush's
group does that a lot as well is diving
into the details of that especially how
we take our our customers through
getting confidence and what does it mean
to to deploy their llm and our system
has a fun for those of us from the 90s
we called inspector raggot yeah
Inspector
Gadget um and it really is a a way to to
make sure that you can take yourself
through that analysis and and feel
confidence in what you're getting not
just the Agate number yeah it's funny
about the 90s I was in a class that a
friend was teaching today or earlier
this week and one of the kids was like I
hear back in the day there's this thing
called geoc cities I hear it was really
cool or something like that I was like
oh my God I gotta get out of here um
yeah so there there's there's so much to
go into there and I think there's kind
of like maybe two topics we could dive
into you know I think the first
submarine I'd love to get your thoughts
on is I think one really great theme I
think that came up from last week's
episode episode was kind of the idea
that almost AI is in this kind of weird
period of like Benchmark bankruptcy
where like essentially there's like all
of these B benchmarks that no one cares
about and then the benchmarks that do
people do care about are like so
thoroughly gamed that they basically
provide no valid information anymore and
like one outcome that I think schit was
saying on on the uh on the episode was
like well that's one of the reasons why
like the solution now is like just talk
to the model for 15 minutes and then you
figure out whether or not it's good or
not and it strikes me that like I don't
know if you put inspector ragit in kind
of this context is like it seems like
there's also a switch from like from
benchmarks to like monitoring as the way
that we really assess whether or not
models are high quality I don't know if
you'd buy that because I as I take kind
of your group's work is an attempt to
say okay well we're not going to really
you know benchmarks are a useful guide
but really in practice what most people
want is to see like lots and lots of
telemetry about their models and like
that's how we approach this problem um
but kind of curious to get your response
on that like do you buy the idea that AI
is is in a benchmark bankruptcy and do
you see kind of ragged as sort of a
solution to that or an answer to that
yeah I think you should think of
benchmarks is something that you should
iterate on rapidly and evolve now the
problem with talk to the model for 15
minutes and just get it Vibes uh kind of
feel of it is uh people are not very
good at coming up with what is the right
thing to talk about it to for 15 minutes
consistently they are not very good they
themselves will uh only think of
whatever came into their head whatever
they were talking about to their
customer last week and they will
introduce like I said a really a lot of
biases and what they thought of then you
end up being very nastily surprised when
you actually go ahead and deploy your
model and they're like well that didn't
work but I talked to it for 15 minutes
it seemed fine you wouldn't also um you
know deploy a representative to a
customer after talking to them 50
minutes and thinking that seems fine so
realistically what you actually want and
what I hope the point is of approaches
like inspector aot is constant evolving
benchmarks yeah talk to it for 15
minutes and then go check yourself hey
what data did you end up actually
putting in what kind of questions did
you end up putting in do you realize
that you didn't do the right Vibe check
do that a few times your Vibe check
becomes then into something systematic
but it's something that is that is
iterative that is interactive rather
than some academic somewhere put out a
benchmark I don't know what this has to
do with my data make your own make it
iterative and constantly you know check
yourself for what you're doing is
actually proper quality that ends up
being really the right thing to do so
move yourself from that um you know
shout out to Daniel Conan from that
system one thinking to the system two
thinking then you're going to have
confidence in what you're actually
deploying yeah this is a I think it's so
interesting because I think this is what
you're describing is going to be an
enormous need across like every company
that attempts to uh adopt this stuff and
you know I was joking earlier about
being a model somaler like I think my
other business proposal is like you're
an eval atellier where you're basically
like we help to craft finely crafted
evals for like what you need and kind of
what you're talking about because like
the art of creating a good Benchmark and
evolving that benar describing a lot of
our jobs actually here at IBM is what
that's literally what we're doing for
our
cellier so um David Kush I don't know if
you got responses you want to jump in um
maybe I can be a little bit
controversial um so yeah um I mean
people talk about rag being a solution
for hallucination for lack of factuality
lack of faithfulness lack of
groundedness these sort of things but um
to me I mean it's part of the solution
but uh I don't think it's the full
solution because even when you get the
retrieve documents there's a model in
between and it can ignore those
documents it can get confused by them it
can uh I mean just hallucinate anyways I
mean all sorts of things so um uh to me
I mean what Marina is talking about is
very important not just I mean over time
but uh like as part of the the the
process initially as well or in runtime
in uh in Fr time so mean checking for
hallucination separately um uh thinking
about can we Trace back the information
where did it come from in those
documents uh can we even come up with
new architectures that uh uh don't
hallucinate uh By Design in some ways so
I think there's rag gets a lot of play
right now but uh I think it's a stepping
stone um I don't think it's the the end
of the journey I actually completely
agree with you fully agree with you rag
has not fixed it it's just an additional
step in the direction I completely agree
with you interesting so do you think in
like I don't know it's always tough to
predict on these things like in two
years we'll be talking about rag I think
we will because um rag is like I mean
it's search right and a lot of the
companies who are in the cell LM game
are search companies at the end of the
day so um I think it'll stick around
it'll uh have a lot to to to do but uh
yeah I mean I think for Enterprise use
cases um maybe it'll uh get
a little bit less emphasis maybe not I
don't know well I mean for freshness of
data some kind of retrieval can be
helpful like you just added it to the
database you can retrieve it immediately
so there there are more than one problem
that rag solves and and I agree with uh
with Christian Marina that hallucination
that's that's a SE it you know it helps
a little but like it's a separate
problem that we need to address lots of
different ways um but but the ability to
access new information the ability to
customize quickly I mean we're starting
to get uh I think layers of technology
that allow us to to address that
instruct lab is one of them if you
wanted to ingest knowledge into the LM
and build it into your sort of your into
the llm itself you can do that but you
probably still want to be retrieving
things and there's going to be a balance
and and we're going to figure that
balance out I think over the next uh
next couple years yeah as itol yeah I
think it's definitely like the much more
realistic pathway that I've heard right
like I think like the other Alternatives
I've heard are like well at some point
the model will become so big and know
everything and then we'll be able to
pre-train it frequently enough I'm just
like I really how many h100 so you're G
to buy to pull this up you know it's
just like is not within the realm of
possibility so yeah and it's not a
concept go ahead David sorry I was just
going to say and not every company's
going to give their data over to open AI
to let them you know their proprietary
data it's it's a real problem yeah all I
was going to say is I mean like these
sort of ideas like having multiple
levels and layers I mean it's part of
Computer Engineering I mean cash in like
different types of locality I mean this
is all like uh very much the sort of
thinking that uh computer people have
had so it just needs to come into to
this too yeah orchestration right and
making sure that there's routing
involved there's decisions that evolves
there's different checks there's
different guards that's not going to go
away I don't care how long you've
trained the L that's not going to be
fixed yeah for sure so we have a few
minutes left on this topic I think the
last area that I want to kind of push us
into is I think Marina you had kind of a
really sort of interesting comment when
you're explaining inspector ragit which
is basically this kind of feature of
trust right essentially like what does a
user need to be shown to trust the model
um and I think what I love about that
topic is that in some ways it's it's
pushes you into the realm away from like
like it turns out people trust models
regardless whether or not they're a huge
parameter model or tiny model and like
you know I heard this great anecdote
where um this mle was telling me this
story like we were doing an eval where
there's these side by sides and what we
discovered is that the users that we
were testing against just felt that
longer outputs were more credible and
trustworthy regardless of any content
they included right and I was like oh
that makes sense because like you're
saying like do 500 of these tasks in the
next hour and so they're just using
these visual heuristics to evaluate text
and one visual heuristic that we use to
tell whether or not something is more
substantive is like is it long and look
dense um and I think this is like such
an interesting thing because if you go
down that route I mean I guess it's a
prescription for madness because you're
basically like well does font Choice
influence how people think like how
trustworthy their models are yeah
exactly and so I guess Marina I'd love
for you to kind of like Riff on that a
little bit is like you know how far does
this Rabbit Hole go like once you move
away from benchmarks and you say we're
going to give you a dashboard of
different things you know you're now
kind of in almost like the theater of
trust like what do we need to show you
and what is the metric that drives the
most trust with the user and is that
trust Justified or not and just would
love to get your thoughts on that as
someone who's working in the space
there's some interesting psychology here
so we know that people get extremely mad
at uh computers when they make one
mistake but they're much more okay if
you were told that it was a human so
that's an interesting psychology there's
another fact that these models are
fabulous snake oil salesmen they will
tell you something that you will read it
and you'll believe it even if in the
back of your mind you're like wait isn't
that not what I thought that was but it
will sound so convincing and so accurate
that you like oh yeah that's that that's
the right answer I have no further
questions they're very good at that so
in that sense actually even human
evaluation is very challenging people
are bad at catching these kinds of
things on the other hand you find
yourself inprise situation and that's
risky if you really did give incorrect
information that is very risky it's
again it's a good reason that you can't
really deploy these models by themselves
with with no support but I think there's
a lot of psychology actually in setting
expectation just in the same way that
when we first had Wikipedia in the world
when we first had Google search and
people thought oh well if it's on the
Internet it's true and then people
learned and I think that over time it's
going to be the same thing where you're
going to learn what kind of things
really need to be what is the right way
to interact with these models what is
the safe way what is the consumer report
sort of uh appropriate way so some of
this is technology a lot of this is
people a lot of this is people
psychology I cannot give you enough data
points and tell you I will never ever
ever make a mistake in this model it's
not possible so we're going to have to
figure out how it is to set people's
expectations if people are allowed to
sometimes make mistakes and you ask for
for a clarification how do we get to
that state of the world also with the
use of the
technology yeah for sure yeah I think
and and this sort of pushes into I think
a sort of interesting direction is like
under certain conditions I'll just throw
out the hot take is like under certain
conditions basically optimally safe
performance of the model is not
necessarily
optimally uh easy to use I guess right
so like that is to say like a perfectly
articulate model may actually signal
more trustworthiness than is warranted
so there actually may be weird
situations where there's kind of this
trade-off which is like we actually
wanted to perform worse because it
inspires uh a an optimal level of doubt
in the user right would be sort of the
theory that I'm I guess I'm arguing well
short of you don't have to have a
perform if it just could just express
its own uncertainty that would already
be a big a big Improvement because
people really aren't great at um
dissociating things like
fluency and you know convincingness of
of of sort of discourse with actual
truth and particularly in contexts that
enterprises are working in with rag
where you're you know taking Enterprise
documents HR documents and policies and
you have to be correct about them unless
the person who's Vibe checking that
model really understands those policies
in perfect detail it's it's very tricky
to evaluate and people tend to be fooled
very easily Chris you want to jump in I
see you kind of um no I mean I think the
word Marina used earlier humility um is
the key I mean the AI needs to be humble
um and we need to be humble as well so I
think that that combination is is the
the right way to go yeah for sure um
yeah and I think it's part of the
problem too you know I was talking with
my friend who kind of relayed this story
as well is basically that um people have
all these expectations that are built up
around computers which makes this
particularly difficult and like the
language model behaves in such a
fundamentally different way that it's
like violating our expectations where
you're like it's good at poetry but bad
at math like that literally flips
everything we have built up in terms of
intuitions about computers for like the
last 20 years and so like the adage I've
been using is like everything you think
computers are good at like llms are bad
at everything that LM are bad at like
you know you know computers are good at
and there's kind of this weird mismatch
that we're sort of navigating at the
[Music]
moment all right so Kush uh you and I
are going to kick this topic off um this
is going to be the big challenge of the
episode um which is if you've been
watching the more nerdy channels of the
AI discourse online people have been
very recently excited by something
called uh the kog gav Arnold
representation theorem which is giving
rise to a paper that proposes a kind of
kog gav Arnold Network or can for short
and um it's very difficult to tell from
the outside I think if you're not a
technical person as to like what it is
and why it's exciting which worries me
because it has all of the indications of
being like Oh do you do you use
blockchain cuz like blockchain will
solve this right like I think we like
rapidly go down that direction so what I
kind of want to do over the next um few
minutes as we close out this episode is
to basically give the clearest easiest
to understand explanation of cans that
anyone has yet articulated on the
internet and we're going to do this
together right okay um so Kush no
pressure on this no pressure um so I
think Kush maybe the best place to start
reading some of the papers is can you
give kind of like a quick explanation of
why models large language models
approximate functions what does that
mean exactly yeah that's precisely the
right place to start so um yeah uh I
mean so a mathematical function uh it's
looking in some space right um so in
middle school or high school we saw
these one-dimensional functions um uh if
our data was just onedimensional what
that uh function is trying to do is fit
that data and by that uh use the
function rather than the data to predict
the next thing um so by doing so we're
actually able to uh to So-Cal generalize
so data tells us the pattern the
function describes the pattern so that
the next time we want to um make a
prediction we use the function instead
of the the past data so I think that's a
very key point and then we can go into
what those functions are how to
represent those functions and how to
compute those functions but uh yeah
that's the the starting point right and
a prediction here just to make it very
simple is like um uh tall people that's
one variable tend to be heavier is that
right like for example that's that's a
prediction that you could build a
function around yeah exactly and you can
be very quantitative about that so if uh
I'm 6 feet tall maybe that predicts that
I'm 180 lbs or something like that so um
yeah so then I think the next step we're
going to take the step by step we think
through this problem step by step right
uh is basically
um
so as I take it right like one of the
things that machine learning has really
done better than kind of traditional
models of AI or traditional models of
you know computer programming is that
we've we frequently tried to kind of
like hand draft all of these rules right
so like you want to write an algorithm
to divide you know pictures of cats from
pictures of dogs you would do feature
engineering right you get a bunch of
people together to kind of like write
these equations these functions out
right and I guess is it right to say
that machine learning what it's been
really good at doing is like coming up
with these functions on its own right
like to basically come up with those
rules on its own to do this prediction
yeah I think that's a good way to put it
I'll be a little bit more specific
though um so we as humans um the
algorithm designers Etc um have been the
ones who come up with what the functions
are um the functions have parameters and
it's the learning algorithm that's
figuring out the parameters to best uh
best fit the data but at some level um
we the the computer scientist the folks
are the ones who decided what were the
functions uh in this library or this
universe of of possible functions and
then let the uh the algorithm figure out
the uh uh the Nuance the the parameters
and so forth yeah for sure and so what
we what we do hear I guess right now in
the world of AI is multi-layer
perceptrons right which is like this
very particular way of implementing AI
that is doing the approximation of these
complex functions on its own and can
solve all of these magical things right
like it can you know have conversations
with you it can sort pictures of cats
from dogs whatever whatever you want do
you talk a little bit about maybe like
the trade-offs of that like what what do
we need to do in order to achieve that
magic right like um yeah yeah so um
there used to be this uh car dealership
that had a commercial one in the 9s
again um so they used to dating all of
ourselves
here I'm yeah so they used to say um
stack andum deep salinum cheap uh in
terms of their cars um so uh so with
these uh the multi-layer perceptrons or
the feed forward neural networks um uh
the trend has been just uh there's this
one kind of these layers what they do is
uh they multiply some inputs buy some
weights um add them up and then apply a
nonlinearity um something often which is
called a reu function a a rectified
linear unit which kind of um changes the
output right so you have those you layer
them on top of each other you stack them
really deep um you keep doing this keep
doing this keep doing this um and that
way uh you actually end up with the
ability to um uh end up with almost any
uh sort of nonlinear function uh to
describe the data so uh you talked about
uh this Universal representation or
approximation theorems and stuff so um
uh you can actually prove that through
even not very deep neural networks um uh
you're able to represent any function uh
that that you have in front of you yeah
so that has been seems like we're at the
magic right like this is this is how it
works um I guess one result of that
right is that these models have been
like they're really expensive to make
like you need like a lot of energy and a
lot of chips and a lot of data and a lot
of computing time um and so
Along Comes col gav and Arnold actually
don't know who those people are I just
know them from their representation
theorem you should know korov from a lot
of other things yeah oh he's that one
okay same
gu okay all right
yeah I know that guy uh it's the big
labasi yeah know um so uh tell me about
the representation theorem right what
does it what does it tell us what's
what's the big deal yeah so um like we
just were talking about so when you have
some function that you're trying to
represent um mathematicians have come up
with all sorts of ways to um be able to
decompose that goal of representing this
function I have in front of me in terms
of more primitive uh other functions
right so uh this is I mean we see it uh
as an electrical engineer I see it um
like forier transforms or Foria series
are ways of decomposing a function into
uh SS and cosine functions um the same
way um uh with the uh uh the mul
perceptrons it's into that particular
structure you're decomposing into these
weights and um these nonlinearities so
what uh the kagor of Arnold
representation theorem is about is
decomposing again any function into um
some other functions in this case they
happen to be one-dimensional um so uh
they could be splines or some other
smooth one-dimensional function and by
combining them uh in a particular
uh summation uh you can again uh
represent any multi-dimensional
nonlinear function and that's the proof
um is what uh what korel tells us that
uh in this way of taking 1D functions
you can come up and represent any uh
multi-dimensional function yeah which is
which is pretty wild right because what
you're sort of telling me is basically
that like the machine learning models
that we have now can work this magic
right they basically come up with a
magic mathematical formula that can help
you tell pictures of C from dogs let's
just take that example and then kind of
what you're saying is that we can take
that magic formula and like represent it
like we can break it down to these tiny
tiny tiny Lego blocks right like all the
way down to what we were just talking
about like tall people or heavier like
single variable stuff right and and kind
of the theory and course you keep me
honest you're the one who actually
understands this stuff is like like any
I don't know or like most very complex
uh formulas can can be reducible in this
way I think is what the theorem is
saying is that right exactly okay so
we're there uh the can Network then is a
network that attempts to do that yeah
exactly so um uh so when in the regular
neural network we had um these weights
um on these edges that are multiplying
the inputs here we're applying that
function the splines or the other sort
of wending um uh sort of nonlinear
function and then instead of uh I mean
you add in the same way so the can is
also adding up the inputs but then
there's no additional um nonlinearity
afterwards um like the reu that we see
um in in the normal neur Network so all
the nonlinearity is done um before you
add things up uh so it's just more
complication in one place rather than in
a in a different place and so uh just by
doing that change you can um uh reduce
the number of parameters um because uh
this more complicated thing on the edges
uh is actually uh more able to represent
the these weird various sort of
behaviors so so that's kind of what it's
uh what it's trying to do nice so we're
there deep breath yes two last questions
yes does it matter what what would what
would can models what's what's the
promise of can models if that you can do
this it seems like you've just taken
this complex thing and turn it into a
bunch of Lego blocks so from my one
brain cell standpoint I'm like isn't
that kind of the same thing yeah um so
it it's true I mean you're just shifting
the nonlinearities from one place to a
different place um one thing that the
cans uh are able to do better a little
bit is um more interpretability so um
when you look at those blinds uh they
actually make sense to us um so that uh
height and weight sort of relationship
or any of those other things um uh we
can understand better uh so there's this
interpretability method called shap
which um people have been using for a
while um this is like automatic shap
without having to do shap in a sense um
inability there for some folks who are
not maybe as into the papers is just
simply understanding what the models why
the model is doing what it's doing yeah
exactly exactly yep um and so I mean
that's one advantage um the disadvantage
though is that uh uh our Hardware
infrastructure has not been optimized um
for uh uh for these sort of things for
these blindes and and so forth um
whereas The Matrix Vector computations
for n networks um are um uh kind of uh
like very highly optimized through those
uh h100s and so forth so uh so that's I
think the difference we might if this
catches on I mean develop some hardware
for this type of thing as well um and uh
I mean the last point that I'll make is
these are not new ideas I mean this is
something that's been around uh uh even
our team uh like a couple years ago um
we developed something called Coffer
nuts it uses continued fractions which
are um a third way of representing
functions also it has a approximation or
um Universal approximation theorem
associated with it um uses continued
fractions that have been known since
Antiquity like the ancient Indians and
ancient Greeks knew about all this stuff
so I mean like all this fancy math is
great and um uh it's just different ways
of putting together I mean different uh
of these functions together and then at
the end of the day um uh they all let
you I mean kind of represent uh these
different nonlinear function functions
how you train them uh might be more or
less costly where the interpretability
is might be more or less easy to hard so
uh so I mean it could turn into
something uh it might just be another
option um so we we'll see yeah for sure
yeah it's fascinating and that I think
opens up definitely a direction that I
hadn't really thought of because I think
the main thing I had heard is well you
can make much more energy efficient
models right but um it seems like two
things you're pointing out one of them
is like we might be able to understand
why these models are making the
decisions they do at a much
like closer level of depth than we have
in the past which seems which seems huge
um and then I think the second point is
actually that this is like not new stuff
right like I mean much like neural Nets
themselves this is like we're just like
pulling all this old stuff back again
and being like Oh I guess it works now
you know ultimately well I think that
resonates really well with Chris's point
about the hardware match I mean often
times uh you know we get success in the
field moves not because something is the
mathematically optimal thing but it's
something that can be done
sort of irrational scale with irrational
speed and and as you say deep you know
deep learning was you know kind of a
Rebrand of artificial Minal networks it
was around for decades before it caught
on why did it catch on it's not because
there was some mathematical
breakthroughs because the hardware like
gpus by accident were really good at
doing this and then that just sort of
set us on a path so um obviously all
these new developments are really
exciting and we could build different
Hardware potentially um but you know any
new idea like this is going to compete
against how wonderfully good gpus are at
doing the basic computations needed for
for deep learning um so you know that
there's it's an interesting battle you
know an interesting set of trade-offs
there yeah that I think relationship
between sort of like hardware and what's
happening on the model side I think is
like one of the most interesting aspects
of this and like how long does it take
for a model to influence Hardware you
know are we just locked into Cuda for
the rest of our lives it's like all
these things are like very very
interesting questions um so uh Marina
any last thoughts uh before we we close
up today yeah um even more General
comment continuing what Christian David
said is representations of data are not
created equal so yes it's the same
information but when you change the way
that you represent it you're able to do
things with it that you weren't able to
before so even with something like a
large language model you're representing
data that exists let's say on the
internet but you're representing it in
such a way that you can access it in a
way that you couldn't before same thing
with for example can versus MLP the
representation changes there's going to
be trade-off but it's always very
interesting to try this uh the fact that
we now have more of these options open
to us because the hardware has caught up
to the maap that has been around for
years or decades or centuries yeah that
means try again try again try again see
what what new things will come up data
representation is really one of the
things underlying driving this current
ERA of AI so more work in this direction
is just going to continue to drive
things an interesting place that's great
yeah well I can't think of a better not
to end on um Kush MVP thank you for
coming on the show again um and Marina
David hope to have you on the show again
and uh I hope all of you listeners out
there join us next week for another
episode of mixture of experts thanks
everyone thanks thank you appreciate it