ChatGPT Atlas Sparks AI Debate
Key Points
- The episode of “Mixture of Experts” introduces a panel of AI experts (Martin Keane, Aaron Botman, and Abraham Daniels) who will discuss ChatGPT Atlas, future AI agents, Deepseek’s DeepSeq OCR paper, and whether LLMs can suffer “brain rot.”
- In the news roundup, major players such as Goldman Sachs, IBM‑Grok, the military, and Uber are all expanding AI initiatives—financing data‑center projects, combining high‑speed inference with enterprise tools, using chatbots for rapid decision‑making, and crowdsourcing model training to drivers.
- OpenAI’s launch of ChatGPT Atlas, a built‑in web browser, is framed as a logical evolution of its search features and a way to integrate browsing history for a more seamless, personalized internet experience while navigating antitrust pressures on Chrome.
- The discussion highlights Andre Krapathy’s forward‑looking projections on AI agents, signaling that agent‑centric architectures may become a dominant paradigm in the next wave of AI development.
- The panel also raises a provocative question about “brain rot” in large language models, prompting debate on long‑term model degradation and the need for continual updating and maintenance.
Sections
- Predicting Atlas Adoption & AI Trends - In the opening of the Mixture of Experts podcast, host Tim Huang and panelists discuss whether OpenAI’s Atlas will achieve large‑scale adoption, preview upcoming segments on agents, DeepSeq OCR, LLM decay, and share AI‑industry news including Goldman Sachs financing AI data centers and an IBM‑Groq partnership.
- OpenAI's AI Browser Adoption Debate - The speakers evaluate OpenAI's new AI-powered browser, weighing its strategic benefits against high‑friction user transition challenges and comparing it with rivals such as Perplexity's Comet.
- Future of Browsers in an AI‑Driven Web - The speaker debates whether browsers will become obsolete as AI agents and conversational platforms evolve into the primary interface for accessing and orchestrating internet content.
- OpenAI's Vision as OS - The speakers speculate that OpenAI aims to evolve beyond a browser into a low‑level operating system that can integrate with any desktop application, positioning its AI as a universal, fix‑all tool for everyday users.
- Karpathy Critiques AI Agent Promise - The hosts debate Andrej Karpathy’s warning that today’s AI agents lack sufficient intelligence and multimodality, questioning whether this signals a slowdown in the technology’s advancement and widespread adoption.
- Navigating AI Hype and Benchmarks - The speaker reflects on past AI winters, stresses the importance of realistic benchmarks and human oversight to temper over‑optimistic expectations in emerging AI systems.
- Production-Grade Agents and Generative Computing - The speaker stresses that only agents with near‑perfect reliability can be production‑grade, viewing generative computing as the key to achieving such deterministic outcomes before moving on to recent research like the DeepSeq OCR paper.
- Vision Encoder for Efficient Document Understanding - The speaker describes a two‑stage system where a vision encoder (Deep Encoder) converts scanned PDFs into compressed textual tokens, mitigating LLM context‑length constraints and enabling flexible decoding for various multimodal language models.
- Visualizing LLM Context Windows - The discussion speculates on turning LLM context windows into AI‑generated visualizations, explores multimodal bridges between language and perception, and references a tongue‑in‑cheek paper titled “LLMs can get brain rot!” that illustrates these ideas.
- Model Degradation and Inertia Analogy - The speaker warns that continual deployment with increasingly shallow training data can cause “brain rot” in LLMs, likening it to adult cognitive inertia and proposing interventions such as virtual lesioning or pruning of super‑weights to restore plasticity.
- Short-Form Social Media Data Confounds Findings - The speaker argues that observed narcissistic or adversarial traits in a study may stem more from the brief, platform‑specific nature of X/Twitter posts than from the content itself, highlighting how short‑form format and platform culture act as confounding variables and exemplify a garbage‑in‑garbage‑out issue for models.
Full Transcript
# ChatGPT Atlas Sparks AI Debate **Source:** [https://www.youtube.com/watch?v=xawn4C43TWo](https://www.youtube.com/watch?v=xawn4C43TWo) **Duration:** 00:44:37 ## Summary - The episode of “Mixture of Experts” introduces a panel of AI experts (Martin Keane, Aaron Botman, and Abraham Daniels) who will discuss ChatGPT Atlas, future AI agents, Deepseek’s DeepSeq OCR paper, and whether LLMs can suffer “brain rot.” - In the news roundup, major players such as Goldman Sachs, IBM‑Grok, the military, and Uber are all expanding AI initiatives—financing data‑center projects, combining high‑speed inference with enterprise tools, using chatbots for rapid decision‑making, and crowdsourcing model training to drivers. - OpenAI’s launch of ChatGPT Atlas, a built‑in web browser, is framed as a logical evolution of its search features and a way to integrate browsing history for a more seamless, personalized internet experience while navigating antitrust pressures on Chrome. - The discussion highlights Andre Krapathy’s forward‑looking projections on AI agents, signaling that agent‑centric architectures may become a dominant paradigm in the next wave of AI development. - The panel also raises a provocative question about “brain rot” in large language models, prompting debate on long‑term model degradation and the need for continual updating and maintenance. ## Sections - [00:00:00](https://www.youtube.com/watch?v=xawn4C43TWo&t=0s) **Predicting Atlas Adoption & AI Trends** - In the opening of the Mixture of Experts podcast, host Tim Huang and panelists discuss whether OpenAI’s Atlas will achieve large‑scale adoption, preview upcoming segments on agents, DeepSeq OCR, LLM decay, and share AI‑industry news including Goldman Sachs financing AI data centers and an IBM‑Groq partnership. - [00:03:15](https://www.youtube.com/watch?v=xawn4C43TWo&t=195s) **OpenAI's AI Browser Adoption Debate** - The speakers evaluate OpenAI's new AI-powered browser, weighing its strategic benefits against high‑friction user transition challenges and comparing it with rivals such as Perplexity's Comet. - [00:08:00](https://www.youtube.com/watch?v=xawn4C43TWo&t=480s) **Future of Browsers in an AI‑Driven Web** - The speaker debates whether browsers will become obsolete as AI agents and conversational platforms evolve into the primary interface for accessing and orchestrating internet content. - [00:11:14](https://www.youtube.com/watch?v=xawn4C43TWo&t=674s) **OpenAI's Vision as OS** - The speakers speculate that OpenAI aims to evolve beyond a browser into a low‑level operating system that can integrate with any desktop application, positioning its AI as a universal, fix‑all tool for everyday users. - [00:14:22](https://www.youtube.com/watch?v=xawn4C43TWo&t=862s) **Karpathy Critiques AI Agent Promise** - The hosts debate Andrej Karpathy’s warning that today’s AI agents lack sufficient intelligence and multimodality, questioning whether this signals a slowdown in the technology’s advancement and widespread adoption. - [00:19:17](https://www.youtube.com/watch?v=xawn4C43TWo&t=1157s) **Navigating AI Hype and Benchmarks** - The speaker reflects on past AI winters, stresses the importance of realistic benchmarks and human oversight to temper over‑optimistic expectations in emerging AI systems. - [00:23:00](https://www.youtube.com/watch?v=xawn4C43TWo&t=1380s) **Production-Grade Agents and Generative Computing** - The speaker stresses that only agents with near‑perfect reliability can be production‑grade, viewing generative computing as the key to achieving such deterministic outcomes before moving on to recent research like the DeepSeq OCR paper. - [00:28:25](https://www.youtube.com/watch?v=xawn4C43TWo&t=1705s) **Vision Encoder for Efficient Document Understanding** - The speaker describes a two‑stage system where a vision encoder (Deep Encoder) converts scanned PDFs into compressed textual tokens, mitigating LLM context‑length constraints and enabling flexible decoding for various multimodal language models. - [00:33:26](https://www.youtube.com/watch?v=xawn4C43TWo&t=2006s) **Visualizing LLM Context Windows** - The discussion speculates on turning LLM context windows into AI‑generated visualizations, explores multimodal bridges between language and perception, and references a tongue‑in‑cheek paper titled “LLMs can get brain rot!” that illustrates these ideas. - [00:36:33](https://www.youtube.com/watch?v=xawn4C43TWo&t=2193s) **Model Degradation and Inertia Analogy** - The speaker warns that continual deployment with increasingly shallow training data can cause “brain rot” in LLMs, likening it to adult cognitive inertia and proposing interventions such as virtual lesioning or pruning of super‑weights to restore plasticity. - [00:40:46](https://www.youtube.com/watch?v=xawn4C43TWo&t=2446s) **Short-Form Social Media Data Confounds Findings** - The speaker argues that observed narcissistic or adversarial traits in a study may stem more from the brief, platform‑specific nature of X/Twitter posts than from the content itself, highlighting how short‑form format and platform culture act as confounding variables and exemplify a garbage‑in‑garbage‑out issue for models. ## Full Transcript
What's your predictions? Do you think people are going to
adopt Atlas at scale? This is going to be a
big winner for them. That's a good question because, I
mean, I can see the benefit to OpenAI. What's the
benefit to us as the user? All that and more
on today's Mixture of Experts. I'm Tim Huang and welcome
to Mixture of Experts. Each week, Moe brings together a
panel of brilliant, funny, thoughtful panelists to debate, discuss and
think through the latest news in artificial intelligence. Joining us
today are three incredible panelists. So a very warm welcome
to Martin Keane, who is master inventor, Aaron Botman, IBM
fellow and master inventor, and Abraham Daniels, who's a senior
technical Product manager for Granite. There's lots to talk about
today. We're going to talk about ChatGPT Atlas, Andre Krapathy's
projections about the future of agents. We'll talk about an
interesting paper out of Deepseek on Deep Seq ocr. And
then finally, we'll ask the question of whether or not
LLMs can get brain rot. But first, here's ILI with
the news. Hey everyone, I'm Illy McConnell. I'm a tech
news writer with IBM Sync. I'm here with a few
AI headlines you might have missed this busy week. The
AI race is long underway, and now Wall street wants
in. Banking giant Goldman Sachs has created a new team
that is focusing on financing deals to build data Centers
and other AI projects. IBM and Grok have teamed up
to combine Groq's high speed inferencing with IBM's AI agent
tools so enterprises can deploy AI age more quickly and
cost effectively. It's no longer just companies experimenting with AI.
Now the military wants in too. Even top generals look
to AI chatbots for answers as they practice making decisions
quickly, a critical skill in the battlefield. Uber drivers can
now earn a little extra cash between rides by doing
small digital tasks that help train Uber's AI models. So
it's a side hustle within a side hustle. Want to
dive deeper into some of these topics? Subscribe to the
Think Newsletter Linked in the Show Notes and now back
to the episode. First off, I really wanted to talk
about the big product announcement of the week, which is
ChatGPT Atlas. So if you missed this news, OpenAI is
now out with its own browser. And we've talked about
this in the past, but I guess, Abraham, do you
want to give us some intuition for, like, why is
OpenAI in the browser game at all? Like, why are
they. Why are they doing what they're Doing. Couple answers
to that. I think there's been kind of breadcrumbs in
terms of ChatGPT or OpenAI entering this space with search
functionality dropping last year, as well as them really being
the entry point to a lot of users in terms
of how they navigate the Internet. So I think, one,
it was a natural kind of pivot for them. But
two, with a lot of the antitrust with Chrome, as
well as the idea that you could have your history
cached as part of your Internet experience so that you
have a better navigation experience when using the Internet, I
think it just makes perfect sense for them. Model development
is not necessarily as hot as it used to be.
So I think OpenAI has been really diligent in terms
of finding new avenues to capitalize on their user base,
you know, which is, you know, over 350 million people.
So I, I think it's. Personally, I think it's a
really smart move. And I think, you know, with, with,
with LLMs kind of being, they're already, you know, an
entry point for most people in terms of how you
use the Internet. It just makes sense for them to
actually, you know, you know, create a browser. I think
these transitions are really hard. I remember like when I
moved from like basic Chrome to Brave, it was like
moving house. I like felt like it was like, took
a long time. They're both chromium browsers and so they're
like actually share a very similar kind of DNA. But
like this like transition from like one browser to another
is like really feels quite high friction. And I guess,
Martin, if. Do you have any thoughts on kind of
like adoption here? Right, because the other one that we've
talked about in the past is Perplexity's Comet browser, which
is like their bid in the space and there it
almost seems like, oh, well, if you have a company
that sells AI as search, it'd become very obvious for
you to do AI browser. Right. Because of the kind
of history of Google and Chrome, I guess. What's your
predictions? Do you think people are going to adopt Atlas
at scale? This is going to be a big winner
for them. Yeah, that's a good question because I mean,
I can see the benefit to OpenAI. What's the benefit
to us as the user? Yeah, good question. Yeah. So
can I share two stories of how I've been using
Atlas this week? For sure. When I tested it out,
so I installed this on my Mac and then the
first thing I wanted to do is I had a
scientific article that I just kind of open up in
Atlas. And of course, now you can bring up the
tab along the side which gives you access to ChatGPT
and you can ask questions. And the questions use the
Open Web pages context. So I could ask questions about,
tell me what the method and the purpose and the
findings were of this experiment. And it gave me all
that information. I mean, that stuff I could have done
easily by just going into a regular ChatGPT window, right,
and just pasting in the URL. But it was kind
of handy that it was there. But also in this
scientific article, there were a bunch of pictures. So I
wondered, could I start asking questions about the pictures and
not reference any particular pictures, just see if it could
figure out which one was appropriate? So I should say
the article was about a scientific study of a beer
brewing method. And I wanted to ask, did one of
the beers look more oxidized than the other? Which you
can sort of tell because they get a little bit
darker in color. So all I said is, did one
look more oxidized than the other? And it found the
one image that was actually related to that, and then
it analyzed the image and it told me, actually, no,
I can't see any difference. So it worked. It worked.
right there. I didn't have to have two windows open.
Now, the second thing that I tried was it has
built in agent mode, where it's supposed to be able
to basically control the browser for you, fire up a
bunch of tabs, do a load of stuff. So I'm
a bit of an amateur book collector, and there's one
Michael Connelly book that I don't have yet. So I
was like, I wonder if the agent can find me
the book. So I asked it, I said, look, I
want to find this particular book. It's called Nine Dragons
by Michael Connolly. I want it to be in hardcover
binding. I want a used copy, and the used condition
needs to be very good. So it goes off and
it's searching a bunch of websites and you see it
kind of working. And it's got some fun animations in
Atlas. And it came back with an answer and it's
popped up the window with the one that it thinks
is best fit, and it found the right book. But
I looked at it and it had the description of
that the front cover of the book didn't look right
to me. It didn't look like any of the other
Michael Connelly books that I'd collected. So all I said
was, this cover doesn't look right. And what it did
is it went and fired off a bunch more Windows
as part of the agent. And this time it looked
up the ISBN number and then it confirmed that that
is the correct picture for that ISBN number. But then
it pointed out this is the UK version of the
book, not the USA version of the book. And you
would actually need to use this other ISBN number if
you want me to search for that. So again, this
was like a really good example, right? Yeah. But the
agent did all of the work for me. And again,
I probably could have done that in my chrome browser
using chatgpt.com, but I would have been kind of flipping
between multiple tabs to do that. So it was beneficial
to me just to kind of have it all there
in one place. Uh huh. So you're pro. You actually
are. Like, you feel like a month from now you'll
still be using Atlas? We'll see. I don't know. We'll
see. Okay, Day one, I liked it. All right, great.
Aaron, I've saved maybe the craziest question for you for
last. I'm sort of interested in like whether or not
in five to 10 years there will even be browsers.
Right. Like, you know, I think one way of kind
of reading the rise of chatbots is, well, if they
get good enough or these agents get good enough, you'll
never need to go directly to a website anymore. Right.
All of the information will be curated, assembled, you know,
maybe everything will be working on mcp. And so like,
you really will have an Internet that is for agents,
by agents. And so the notion of like you having
to browse the web is, is maybe like this artifact
of the past. And so I guess one question for
you is like, over the long run, do browsers even
make sense as like a category of product? Yeah, you
know, you know, so, so taking out the crystal ball,
you know, and just thinking about the projection of tech
is going. Yeah, I mean, it's, it's fascinating, you know,
because, you know, the paradigm's changing. You know, I think
that OpenAI, they're looking at turning our computer, our computing
devices into a playground, but it doesn't yet have control
over the structure and function of that playground, at least
yet. Right. So we're trying to preserve some privacy pieces.
And it looks like ChatGPT is trying to become more
like an operating system where you can come and use
These different applications. It's like an operating system for AI
apps, where in this case, the OS role here is
more about orchestrating AI tools, workflows, plugins and such. So
it's not going to replace macOS or Windows or Linux
or so on. You know, it's not aiming to act
like a low level OS that controls this kind of
hardware, but it's abstracted up a level, you know, where
it handles apps, SDKs, third party apps, agents. Right. And
the line between apps and platform, you know, it's beginning
to blur a bit, you know, and we have to
think too, that our computers really aren't built for AI
per se. We have to farm out lots of these
models, powerful big models and even the agent wrappers up
into the cloud or these big compute clusters. Right. And
so we need something new. Right? And so this is
where I think, you know, generative AI and generative computing
combined together will help us achieve sort of the future
of what's going to happen. I think some of the
risks that we all just need to be aware of
is data and privacy, you know, that, you know, just
making sure that we still can control and decide, you
know, what this new os, right, is going to do
and what it can do. Right. There could also be
these hidden prompts or what we call comet jacking, where
there's a lot of these agent risks, right. That could
happen and it just sort of does it by itself,
you know, where it hijacks, you know, Comet or hijacks,
you know, Atlas. Right. There's also less transparency and control
that we have as we go further into the future.
AI can make mistakes, as Martin was mentioning before, or
at least it seemed like a cognitive mistake, but it
actually went to the uk, found a book rather than
trying to find a book, maybe where we currently live.
So it's like that information graph that it didn't associate
correctly to the user. But in essence that's where I
think it's going. And it's going to be fascinating to
watch the field as it sort of begins to change
and turn and it's going to change very quickly, I
guess. Abraham, any thoughts on ultimately where OpenAI goes with
all this? I mean, Aaron kind of name checked, sort
of the idea of like, well, ultimately their ambition is
world domination. Ultimately the ambition is not just a browser.
They want to create a thing that can use any
app on your computer and it starts to look like
something which is maybe more akin to like what we
identify with, like a lower level operating system. Is that
where they're headed with this. Ultimately they've already added features
in which they can start to plug into apps on
your desktop laptop and as part of just the ChatGPT
feature. So in terms of plugging into your apps, I
think that they've already gone down that route. When I
think about how people use the Internet today, when I
say people, I don't mean researchers or individuals that maybe
a little bit more acutely aware of generative AI. I'm
talking about your everyday user. They see it more as
a fix all tool. So they don't have the same,
in my opinion, you know, guardrails or you know, specific
issues with some of the security around using it. They
see it as kind of my generation would have saw
Google where this is, you know, this is a truth
search engine. Whatever comes up is typically going to be
is real. Yeah, yeah, exactly. So for the average user,
I think having something like Atlas1 simplifies their Internet experience.
would gladly take it. To be honest, from OpenAI's perspective,
obviously monetization is, you know, a big aspect of their
business. So I think this opens up a huge world
for them in terms of being able to monetize it,
whether it's via ads or, you know, what have you.
But yeah, I personally, and I may be biased here,
but I think this was a really smart move by
them. I think it was. I think everything that's happening
in the search industry right now I think is only
going to benefit them. In terms of people adopting Atlas,
I think they've done a great job of gaining mind
share and gaining a market before throwing this out where
it's a really easy switch. And to Martin's story where
he could have done it in GPT, but why not
just do it on the browser where you have all
the context right in front of you. You can ask
whatever question you want, have the memory cache. Yeah, I
think it just makes perfect sense to be honest. Yeah.
Martin, it looks like you might want to jump in
or. One thing I will say though is as soon
as you launch that browser, of course now the decision
is do you want to switch over to another browser?
And it is not shy on asking. Within about two
minutes it was like, can I be your default browser
now I haven't even put in two search queries. We're
just getting to know each other. It also asked for
Bluetooth which was, I was like, why do you need
Bluetooth? My device? You're like for reasons. Exactly for reasons.
Well, this is a Nice segue to the next topic
I wanted to cover. So, Andrej Karpathy, who we've talked
about before on this show, famously OpenAI co founder, influencer
in the generative AI movement, was on a very prominent
AI podcast, the Dorkash podcast, fairly recently, and he had
this kind of much discussed set of comments that he
made there, which I'll just kind of quote here. He's
talking about agents. So he said, quote, they just don't
work, they don't have enough intelligence, they're not multimodal enough,
decade to work through all those issues. And I think
maybe this is actually a really nice thing to build
off of Abraham, what you just said, which is, is
that going to be a barrier to agent adoption? I
think, like, Andres definitely is kind of like looking at
this from the perspective of a researcher who's aware of
the technological limitations of what's being built. But it sure
seems like people have enough confidence in these systems that
they're more than willing to adopt agents, use agents, even
in the presence of these kinds of problems. And so
I guess maybe, Martin, to throw it to you, how
big of a deal do you think are the issues
that Karpathy is kind of pointing out here? Should the
space be worried that it's not going to be as
advanced as we thought? As quickly as we thought? Yeah,
I think when somebody who's worked in prominent positions in
two frontier AI labs, Tesla and then at OpenAI, comes
out and says, agents are terrible, they're oversold and are
10 years away from being useful, some years are going
to prick up to that sort of thing. It was
very interesting to hear him sort of give some of
the reasons why he thinks that that is the case.
I mean, in my experience with this book buying Agent,
it already made a mistake that a human probably wouldn't
do. If I asked Aaron, like, hey, Aaron, my personal
assistant, could you go out and find me that Michael
Connolly book? He's probably not going to come back with
the UK edition. That would be part of the processing.
So, yeah, you could see that. But he mentioned a
couple of other things that can be sort of the
cause of this, why agents sometimes just don't do what
would seem like the intuitive thing from the human perspective.
And one of the things he mentioned was training data.
And he said that if you took the training data
set of any large language model and you just picked
out a random single document from that Training data set.
He said, chances are that is either going to be
just irrelevant, like it will be a stock ticker figure
on average, most of the content that it's kind of
scraped off the Internet is just kind of nonsensical, or
it's full of errors, but if you have enough of
it, then you can see the signal for the noise.
So the training data could be a big part of
that. And the second sort of controversial reason he gave
were his opinions on reinforcement learning, where he also declared
reinforcement learning as, well, pretty bad. He gave the example
of a math problem that the reinforcement learning works by
rewarding answers that are correct and punishing answers that are
not correct, but not necessarily caring too much about how
they got there. So did you do the right calculations,
or did you kind of stumble on it by accident,
or did you add in a whole lot of extra
steps that you really didn't need to do? And I
think those limitations of reinforcement learning appear quite prominently in
the agent's chain of thought. So when you do set
an agent out to do something and you see it
processing its chain of thought as it trying to work
through steps, it will often get stuck in these loops
where it's doing things where you think, okay, let's just
move on past that, get to the next thing. And
I would suspect that reinforcement learning is a large part
of that, that it's not always finding the most optimal
ways to do things. But, yeah, I think when somebody
like that comes out and says agents are currently being
oversold, it is going to affect the industry. People are
going to listen to that. Yeah, definitely. Yeah. The downstream
effect of this is going to be big because obviously
there's been so much excitement about, say, what agents can
do and the promise of it. And, you know, I
think there have been kind of rumblings from certainly the
business space. Right. I think a couple of banks have
come out, and there's this report from, I feel a
few months ago that was like, oh, a lot of
these pilots are not quite working out, but it seems
to be like the first case of a really kind
of like, strong technical, influential technical voice being like, guys,
this is. This current research plan is not going to
work. Aaron, do you buy it? Should we really be
tapping the brakes on our optimism around this stuff? I
think history always repeats itself. Right. We just need to
learn from history so it doesn't. The bad parts of
history don't A. Surprisingly hard thing to do. That's right.
If I look back in the 90s and even 80s,
we have these knowledge based systems and there's a lot
of promise around them and it sort of, we entered
in, you know, into a winter of AI, right. And,
and a lot of that was fueled by the early
neural networks. It couldn't even solve the Xor problem, you
know, and so therefore we had to go to multilayer
perceptrons, you know, to help solve that. But then we
didn't have the computational ability. Right. So there's always stumbling
blocks and problems that have to be solved by science
and engineering. Right. And this is no different. I think
what he's doing is very smart. Right. And I think
that we need to reel in, you know, lots of
and they may not even know how to spell AI.
Right. And so we need to be careful about that.
And I think Andre, he's taken a long call position,
which I think is what a lot of us should
do. Right. And he's trying to mitigate some of this
overhype. It's very risky because we've seen it time and
time again where, you know, a system doesn't live up
to its, to its hype, but it does do what
it's built to do, you know, and, and so by
reining in and creating those benchmarks and guideposts, you know,
it really, really helps us out and, and AI agents,
we are in the early stages, you know, it's, and
it's very exciting, you know, it's, it's fun to play
with. But I will say when I'm building a production
system, whether it's for sports or for, for entertainment, I
always have a human in the loop, right. To make
sure that what I'm producing is consumer ready. Right. We
go to scale, right? A 1 to 2% error rate,
that's huge. I mean that's 1 out of 100 requests.
If I'm getting billions of our requests, a lot of
people are seeing incorrect problems. And that's not even to
say that these systems are going out to use external
tools with say for example, ncp to activate something outside
of the ecosystem, which we need to be very careful.
There's a lot of non determinism around these systems. And
I'm studying, actually looking at when should we use know,
machine learning versus generative AI, because there's a place for
both and when should you combine them together to get
the best of both worlds. But, but yeah, I, you
know, I do think that he's taken a long haul
position and, and it's a very smart thing to do.
Abraham, I'll give you the last word on this topic.
One of my favorite images from the moment that we're
in an AI, we talked about it, I think on
a previous episode was the chart of like how the
money is flowing in AI and it's just like, you
know, Nvidia gives money to OpenAI. OpenAI gives money to
NVID. It's just like it's a circle basically is where
the money's flowing. Is, is this going to, is Andre's
comments going to pop the bubble? Is there a bubble?
I don't know what your views on this are. I
don't know if the news is going to pop the
bubble. Whether there is a bubble or not. I think,
you know, I'll let everybody decide for themselves. I definitely
think there's overhype, that's without question. And I think there's
overhype for a specific reason that, you know, probably predicated
on a financial reason. But Aaron said something that kind
of really resonated like the, the, the current patterns for
agents are non deterministic. You know, whether it's, you know,
a planner or you know, just taking the models many
times for inference scaling or, you know, they just, they
don't offer the outputs, the guardrails around outputs in terms
of I need a specific output every single time and
if it's not operating within this scope, then, you know,
go back. So I think from a agent's perspective, like
I think there's, there's good enough where you know, you're
doing a search function and the stakes are low. If
you don't get it, you know, you can redo it.
And then there's production grade agents where the stakes are
high enough, where if it's not, you know, 99 plus
percent, it's, you know, we can't move it to production.
So I think there's different worlds in terms of, you
know, whether agents are going to make it or not.
Personally, I think there's a, a need for more of
a deterministic outcome for these agents. I think software is
going to be kind of that. So generative computing specifically
is kind of going to be that key piece in
terms of making sure that agents are production grade. Whether
that's through policy management, whether that's through requirements or IVR
patterns or what have you, I don't know. If it's
10 years or not. But I mean, I definitely do
agree with the statement in terms of the overhype around
agents, but I also think that there's still a place
for them today. It's just a matter of being able
to define where. Yeah, your use case. Yeah, it makes
a lot of sense. I'm going to move us on
to our next topic, the next two segments. We're going
to talk a little bit about sort of interesting papers
that have kind of come across our radar in the
last few weeks. And I guess, Martin, I'm going to
pick on you. A few weeks ago I picked on
Chris Hay. I was like, could you explain manifolds and
exactly how they work in the context of machine learning?
I'm not going to do anything so mean to you
today, but a super interesting paper out of Deep SEQ
called Deep SEQ ocr. The paper is Deep SEQ OCR
Context Optical Compression. And I guess, Martin, the first question
to just toss to you is that the paper is
trying to deal with the problem of models having trouble
dealing with long contexts. And can you tell us a
little bit about that problem? Why does that happen? What
are the kind of practical implications of that? If we
look sort of the trend at large language models now,
we're seeing bigger and bigger context windows to fit more
and more stuff in. So the more and more stuff
that it can keep in mind is going to be
prioritized when it comes up with a response. So how
do you get as much information as you can in
a certain context window, given how computationally expensive it is
to expand the context window? So, yeah, this was kind
of an interesting idea of actually basically turning these tokens
into visual tokens and you could actually cram a lot
more information into that, but depending upon the algorithm that
you used, there was a certain loss when you convert
it back again into text, but quite a small loss
for some. So I think the best model there was
something like a 97% rate of being able to take
text, basically do this conversion, convert it back again, so
the decode, encode cycle, and then 97% of the text
was about right. Not too bad. Not too bad, right.
But you could have, if you have an image with
basically less information in it that that can still go
through this encoder decoder loop and still bring back text,
but the text has lost a little bit more information.
And what's really interesting is that they wrapper this around
this idea of a forgetting mechanism which kind of mimics
human memory much more. So the big thing in the
paper is how similar this mimics human memory. So, for
example, if you run this through the best model, it
basically remembers almost everything. Just like I remember on this
podcast the question you just asked me now, Tim. But
I was on this podcast a month ago and I
remember who the guests were and what the topics were,
but my memory is not fully there now. I don't
remember the specific questions that you asked me or the
specific talking points that the other guests made. So it's
a bit more fuzzy. Well, they kind of, they, they
model that in this paper and they say, well, actually
that sounds like the base model or the small model
that we have, because the small model that is considered
a more blurry image and will be about the equivalent
of one month's of human memory. So something that happened
to me one month ago, if you use their blurry
model, then that will create an image that will be
about the same. So about the same amount of stuff
will be forgotten there. So it sort of brings up
an interesting point is, is there actually some utility in
that, in that having large language model context windows mimic
human memory a bit more? Is there a reason that
we evolved to remember things in the present very well
and then just to remember things more in abstract as
time goes on? And the fact that this could model
that as well? Yeah. And the answer there being like
in the biology case, that essentially it's just in the
same way that it's computationally intensive for machines, it's also
kind of computationally intensive for us to have increasingly large
context windows, in effect. So one practical import of this
paper seems to be, look, we could feed in language
tokens or we can feed in picture tokens, I guess
to make it a very simplistic kind of distinction. The
end result is that we can, we can do a
lot more compression, right. Which I guess gets us to
longer and longer contexts. Is that kind of where this
is all headed is like you can start to put
even more in the window if this is actually something
that becomes more production ready. I think about this as
a document distillation, much like model distilling. You have big
model make a smaller model. Here we have a document,
but we want to distill it down into principal components.
And it's in this similar, from a math battle perspective,
perspective of principal component analysis, where you want to keep
the largest eigenvectors so you have the most variance that
can explain your data. This to me seems a bit
similar, except we're using these different vision encoders. So they
have a two stage system would have a vision encoder
called Deep Encoder. And it's pretty cool because they take
in a PDF file but a scanned piece and it's
not just ocr, so it's not just extracting text, but
it looks like it's liberating, right, the text, we can
actually see what it is. And it turns this messy
human written world into something that AI can understand. And
it helps to create these smaller tokens so we can
have this new way of document understanding. And it helps
with the core problem that LLMs struggle with these long
contexts due to the quadratic scaling, that the longer the
context comes, it becomes harder and harder for these systems
to process it. And therefore we leverage this efficient compression
technique so that we can understand this information into a
textual representation. And what I think is neat in this
two stage system, when you get down to the decoder,
what if you could just take that vision encoder and
you've put it into a new language, then you could
train a new decoder so it translates it into any
other kind of language, such that it can be used
by another, you know, large language model, any sort of
multimodal model. And it just produces something very different, you
know. And so I think it'll become more of like
an art form, you know, where you're putting together these
different layers of encodings and decoders and finding out what
best works. In fact, you could think of it like
a search problem, you know, find the best set of
models of decoders, encoders to solve a certain problem in
the most efficient way. But yeah, I'm excited. I liked
it, what they showed here, looking forward towards their next
paper where I think they're going to have an expanded
piece where they're going to add some more experiments to
see how it works with multimodal and text and fused
together and so on. Abraham, maybe a final question for
you is just to zoom out a little bit. Deepseek
obviously kind of got on the map in a big
way through the release of its kind of open source
models. It is a lab that's doing research and is
publishing papers. Do you have any speculation on why it
is Deepseek is interested in these kinds of research questions?
OCR is one of those really old problems, so it's
a great question. Personally, I think this may be just
one of those innovations that they found in the lab
that demonstrated a step forward that was worth releasing. Also,
I think that the idea of having this infinite or
longer context window is really applicable to particular reg use
cases that are really important and something that hasn't been
solved across the board. So I think both those answers
help them. One, make sure that their name is still
in the news. But two, demonstrate that the research lab
is still doing some pretty cool things. In terms of
this particular paper, what I thought was really neat was
obviously shifting from a different approach in terms of how
you actually encode and embed a document. What I would
love to have seen personally is the semantic representation still
kind of kept in terms of moving from an image
to text. And what does that actually mean for downstream
applications? Because that's where you're really going to see a
lot of the implementation here. Or is this really just
an OCR text extraction where there's some really quick plug,
IBM where Docling does this extremely well, extremely efficient, where
you don't need an LLM to do that, to be
honest. So it's a little bit of overkill in that
regard. But yeah, I'm excited to see the next version
of this paper and the next version of this effort
from Deepseek. Yeah, it doesn't seem like a very flashy
milestone, but it is definitely a critical piece of the
AI stack. If you take the OCR part out of
this, it's a compression bridge to help help other models
to handle large scale problems with a very small number
of tokens. Right. So it's pretty neat. You know what
they're doing here. It would be really cool. I think
Aaron already kind of commented on this, but in terms
of the output, like the most models are text token
oriented. So it would be really cool if they can
release some type of plugin that you can swap out
different decoder models in place of their decoder model. So
this is a little bit more from an adoption standpoint.
You can kind of use you from an LLM standpoint,
what makes sense for your environment. Do you think the
next stage of this is that we're going to be
seeing kind of. We already have AI art, right? Are
we going to see AI art of context windows like
Aaron, are you going to have that picture behind you?
Is that going to show some kind of visualization of
your context window that we'll all be able to pick
up. On now that would be a scary proposition. You
don't want to see my context window. You do not
want to see it. You know, the whole notion of
like effective computing, you know, where these systems can understand
how you're feeling, what you're thinking, I think that this
might play into some of it because it's like another
bridge into understanding language from different areas. And that bridge
could be between modalities or between people and models, in
a sense. And it creates this language such that we
can have other interpretations and other agents, you know, go
ahead and run and change that image maybe behind me.
All right, I'm going to move us on to our
final topic of the day. So last paper we talked
about Deepseek, OCR a little bit kind of in the
guts of the system. This other paper was just fun.
It got talked around, talked about a lot online, and
I figured we'd kind of bring it up here. So
the name of the title is very striking. It just
says, LLMs can get brain rot, exclamation point. And the
intuition of the paper is kind of a fun idea.
It basically says, look, there's a lot of hand wringing
and concern that if we consume lots of junk media
on social media, that we will, as humans literally get
brain rot. Right? Like that we will think less. Well,
we will do reasoning poorly, have all these cognitive defects
from exposure to this content. And so the researchers just
simply say, like, well, what if we can LLMs get
brain rot too? And so what they did is they
kind of curated a couple of data sets of social
media content that they considered short and popular or sensationalist.
And they said, well, we can do kind of like
a little post training mix to the model where we
say we're going to slowly increase the amount of junk
web text is what they say, how they framed it
up, up to the model, and then we see how
it performs against certain benchmarks. And what they claim is
that these models do experience a form of cognitive decline.
And so they say that there's declines in reasoning, long
context, understanding, safety, and then also even claim that there's
the emergence of these dark traits. Right? These models become
more narcissistic because they've seen this content. So I guess,
Aaron, maybe I'll throw it to you. What does this
paper show? Does it show that, like, if we consume
lots of bad material online, then our brains are literally
going to rot? Or what is this? Yeah, what do
we take from this paper? I mean, you know, the
headline for me here was garbage in, garbage out. But
I think the big flashing star was that the decline
of these LLMs, it was persistent and systematic. Right. It
wasn't like, you know, you could just quickly fix it.
Right. And, you know, the risk is that, you know,
as we put these systems in the wild and more
training data, you know, become shallower and Shallower, you know,
for example, then we need to continually evaluate these models
because this brain rot can happen. And then I sort
of ask myself, why does this happen? And I was
thinking in terms of momentum and inertia. So during these
back propagation, when you're training, I was thinking maybe there's
this extra momentum and the gradients start becoming much more
marginal as you learn over time. But then once you
stop training, then it's like you have this inertia where
you can't get or unlearn fast enough. And it's like
humans, where we have kids and child, their brains are
very plastic, they can learn very quickly, they can change
very quickly. But then as adults, you get older and
older, you have such large amounts of knowledge and embed
it, which is fantastic. But then on the other hand,
right, because our brains are already wired very densely as
opposed to kids, it seems as though these LLMs are
becoming wired much more densely so quickly, where they're sort
of moving out of their childhood in essence. And it's
harder to change that systematic perspective. And there might be
other techniques that are needed, such as you could do
some virtual lesioning, right? You could do some neural damaging
or find what the superweights are within the model, remove
them, and then train to sort of remove this rot
that is happening. But it's definitely a, you know, a
real problem here. And it does parallel, you know, human
cognition, you know, you know that we humans need to
be careful, you know, what we learn and what we
really focus our attention on as we're out, out in
the wild too. So I guess, Martin, are you. You're
not really surprised by this result, right? I guess insofar
as you think that this is like Aaron's interpretation is,
yeah, obviously you feed in some bad content model behavior
will get worse because it's just mirroring that. So I
guess how shocked should we be about these results? Is
there anything surprising here? I suppose I feel like this
is catnip for every parent of a teenager who can
say, look what happened to that chat model when we
gave it all this junk. The thing that I found
most surprising, so you mentioned, Tim, that they kind of
categorized this BrainRock content into really two types of engagement
and semantic quality. So engagement were just kind of short
pieces of information, like a tweet or something. It's just
a sentence or two. It's giving some kind of factual
information, but really briefly, no room for nuance. It's just
like, here's this fact. And then semantic quality was the
sensational Stuff that, you know, wow, look at this. That
sort of thing. And there was actually a significant difference
in when the model was fed M1 versus M2 junk
data, the engagement versus the semantic stuff, as to the
outcome in terms of personality and the M1 data. The
engagement stuff, the things like tweets, affected personality a lot
more than just the sensational stuff. It seems that that
didn't really affect personality much at all. But the, the
engagement stuff, the short tweet stuff that increased when it
was pushed to 100%. So all the model received was
just a bunch of tweets. It increased narcissism, it made
the model less agreeable, and it made the model more
extroverted. And I'm like, that sounds an awful lot like
an outspoken TV pundit or something like that. So if
I was training to be a talking head, like a
shock jock kind of thing, this is where I need
to get my training from. I need to just be
looking at short form stuff. And I'll get those traits
too. I see. Just to add to that, the M1,
when trained on the M1 data, it also, you saw
a sharp increase in abrupt stop in thinking. So it
didn't go through the full thinking process or would cut
it short or not do it at all. So it's
just one of those weird things where it's to your
point in terms of the personality of the individual who
typically wants to shock enough versus actually have a confidence.
Well, and I think that's kind of the interesting thing
here. I mean, that's actually, that's like one thing I
do want to get to on this paper is like,
how much there are these like interesting kind of confounding
variables here. Right. Because it like, it may not be
that it's short content. Right. But it may just be
that it's short content drawn from Twitter Now, X. Right.
And so I wonder if like the. The presence of
narcissistic, you know, adversarial traits comes less of the fact
that the content is short form and more because of
the culture of the place where the data is being
drawn. Now, I guess, Abraham, the question for you is,
I guess maybe in the case of reasoning, because it
is short, it actually limits how much reasoning it can
do. And so maybe that's actually like there's these really
interesting kind of effects here. Some of them related to
it being short, some of it being related to, I
think like the source of it. Right. Certainly my feeling
about X is that it's a platform where you feel
that there is A lot of aggressive antisocial behavior. No,
that's fair. You only get 156 characters or 256 characters.
So yeah, there's not going to be a lot of
thinking kind of allowed. What I think this paper really
shows is obviously garbage in, garbage out, but there's always
much more important than quantity in terms of actually getting
performance on your model. One thing that I thought was
really interesting was the outline that the quality of the
data degraded as it was newer. So when they went
through the actual corpus of training data, the more recent
the data was, the lower quality it was, which was
just very interesting to see that. Does that mean that
our quality of content is just gradually with getting to
the lowest common denominator or is this the product of
just lazy literature? But it was. Yeah, the paper was
interesting. I think there's a lot of parallels or philosophical
questions you can ask yourself based on this paper and
but I'll leave that to other people. I've always thought
that it's always best to, you know, train these large
models on factually correct, dense, deep data. Right. And that's
your foundation model. Right. And then if you want to
change the personality, the tone, the pitch, prosody of speech
even, that's where you use context engineering and then you
add in the traits of which you want it to
behave. But it's not necessarily always the best, you know,
to try to train the traits that you want it
to act like with an embedded into the knowledge structure
because then you're sort of watering down and it's kind
of like brain loss amnesia. Right. Because, because the models
are really, they're forgetting about what they know and it's
more about how should I act. Right. And so therefore
it becomes very watered down. And so because of that,
the decline of its behavior or the way it reasons
it's not easily fixed by later instruction, tuning or even
cleansing the data and it couldn't recover the baseline capability
because of that. And so therefore I do think that
this is a lesson learned that when you do training
and you fine tune, be really, really careful about the
kind of data that you do and make sure that
your mark marching towards the objectives that you want to
have with the model. Yeah. And I think if we
then extrapolate that to us humans, if there is a
parallel here, maybe we should be spending less time on
Twitter and more time consuming high quality long form content
like the mixture of Experts podcast. Yes, exactly. We highly
recommend this. Yeah. Well, that's a great note to end
on. Martin, Aaron Abraham, always great to have you on
the show and hope to see you guys soon. And
that's all the time that we have for today. Thanks
for joining us, listeners. If you enjoyed what you heard,
you can get us on Apple podcasts, Spotify and podcast
platforms everywhere. And we'll see you next week on Mixture
of Experts.