Historic $300B Oracle‑OpenAI Cloud Deal
Key Points
- Oracle announced a massive $300 billion, five‑year cloud contract with OpenAI starting in 2027, positioning Oracle as a primary multicloud partner alongside Microsoft’s Azure.
- The deal fuels the prevailing “picks‑and‑shovels” narrative for AI profits—owning data‑center and GPU infrastructure—while prompting a sharp, though potentially unsustainable, 40% surge in Oracle’s stock.
- AI‑focused valuation models (using both Claude and ChatGPT agents) suggest Oracle remains severely overvalued even after accounting for the deal’s net‑present‑value, highlighting a disconnect between market hype and fundamentals.
- For OpenAI, the agreement signals a strategic “soft divorce” from Microsoft, giving it leverage and visibility as a market leader while setting the stage for future model generations (e.g., GPT‑7/8) rather than immediate impact on current releases.
Sections
- Oracle’s $300B OpenAI Cloud Deal - The segment highlights Oracle’s announced $300 billion, five‑year multicloud agreement with OpenAI slated for 2027, discussing its strategic shift away from Microsoft‑centric hosting and questioning Oracle’s soaring valuation despite the stock surge.
- OpenAI’s Road to 2030 Profitability - The speaker outlines OpenAI’s projected $90 billion cash burn, its reliance on massive demand to achieve profitability by 2030, and highlights the uncertainty around which unit‑economics model (per model, per data center, or conventional) will ultimately prove viable.
- Avoid Competing on Core Primitives - Builders should focus on specialized tools and orchestration rather than trying to replicate basic work primitives dominated by well‑funded AI platforms.
- Claude’s Tool‑Based Workflow Efficiency - The speaker explains how teams are leveraging Claude’s off‑hour reliability and on‑demand inference model to handle work tasks via tool calls—producing high‑quality documents efficiently, though not flawlessly, and emphasizing practical utility over binary judgments.
- AI Commerce & Self‑Regulation Outlook - The speaker expects industry self‑regulation for minor protection while noting Google’s multilingual AI search expansion with built‑in shopping tools, signaling a move toward chat‑driven commerce and checkout integration.
- Future Agent Collaboration & Hallucination Research - The speaker highlights emerging AI trends like Genesis’s autonomous agent‑to‑agent workflow composition slated for 2026, alongside OpenAI’s new study tying hallucinations to word‑prediction‑focused pre‑training.
- Organizational Approach to LLM Hallucinations - The speaker argues that hallucinations must be treated as an organizational problem, cautions against simplistic fixes, and stresses that the blunt reward signals in LLM training fundamentally limit nuanced, accurate responses.
Full Transcript
# Historic $300B Oracle‑OpenAI Cloud Deal **Source:** [https://www.youtube.com/watch?v=_KneeDIbSa0](https://www.youtube.com/watch?v=_KneeDIbSa0) **Duration:** 00:24:48 ## Summary - Oracle announced a massive $300 billion, five‑year cloud contract with OpenAI starting in 2027, positioning Oracle as a primary multicloud partner alongside Microsoft’s Azure. - The deal fuels the prevailing “picks‑and‑shovels” narrative for AI profits—owning data‑center and GPU infrastructure—while prompting a sharp, though potentially unsustainable, 40% surge in Oracle’s stock. - AI‑focused valuation models (using both Claude and ChatGPT agents) suggest Oracle remains severely overvalued even after accounting for the deal’s net‑present‑value, highlighting a disconnect between market hype and fundamentals. - For OpenAI, the agreement signals a strategic “soft divorce” from Microsoft, giving it leverage and visibility as a market leader while setting the stage for future model generations (e.g., GPT‑7/8) rather than immediate impact on current releases. ## Sections - [00:00:00](https://www.youtube.com/watch?v=_KneeDIbSa0&t=0s) **Oracle’s $300B OpenAI Cloud Deal** - The segment highlights Oracle’s announced $300 billion, five‑year multicloud agreement with OpenAI slated for 2027, discussing its strategic shift away from Microsoft‑centric hosting and questioning Oracle’s soaring valuation despite the stock surge. - [00:03:52](https://www.youtube.com/watch?v=_KneeDIbSa0&t=232s) **OpenAI’s Road to 2030 Profitability** - The speaker outlines OpenAI’s projected $90 billion cash burn, its reliance on massive demand to achieve profitability by 2030, and highlights the uncertainty around which unit‑economics model (per model, per data center, or conventional) will ultimately prove viable. - [00:08:35](https://www.youtube.com/watch?v=_KneeDIbSa0&t=515s) **Avoid Competing on Core Primitives** - Builders should focus on specialized tools and orchestration rather than trying to replicate basic work primitives dominated by well‑funded AI platforms. - [00:12:06](https://www.youtube.com/watch?v=_KneeDIbSa0&t=726s) **Claude’s Tool‑Based Workflow Efficiency** - The speaker explains how teams are leveraging Claude’s off‑hour reliability and on‑demand inference model to handle work tasks via tool calls—producing high‑quality documents efficiently, though not flawlessly, and emphasizing practical utility over binary judgments. - [00:15:51](https://www.youtube.com/watch?v=_KneeDIbSa0&t=951s) **AI Commerce & Self‑Regulation Outlook** - The speaker expects industry self‑regulation for minor protection while noting Google’s multilingual AI search expansion with built‑in shopping tools, signaling a move toward chat‑driven commerce and checkout integration. - [00:19:41](https://www.youtube.com/watch?v=_KneeDIbSa0&t=1181s) **Future Agent Collaboration & Hallucination Research** - The speaker highlights emerging AI trends like Genesis’s autonomous agent‑to‑agent workflow composition slated for 2026, alongside OpenAI’s new study tying hallucinations to word‑prediction‑focused pre‑training. - [00:22:47](https://www.youtube.com/watch?v=_KneeDIbSa0&t=1367s) **Organizational Approach to LLM Hallucinations** - The speaker argues that hallucinations must be treated as an organizational problem, cautions against simplistic fixes, and stresses that the blunt reward signals in LLM training fundamentally limit nuanced, accurate responses. ## Full Transcript
what was most important that happened in
AI this week. I want to go through,
we're going to keep it pretty casual.
We're going to go through the news
stories I think mattered the most. I'm
going to give you some commentary on why
I think they mattered and where they're
going strategically. We're going to do
one news story at a time. Number one,
Oracle and OpenAI's historic $300
billion cloud deal. Fundamentally, what
happened is that Oracle announced during
their double earnings mess that they had
signed a $300 billion five-year cloud
computing deal with OpenAI that will
begin begin not this year, not next
year, but in 2027,
marking one of the largest contracts in
tech history. This would position Oracle
as OpenAI's primary cloud provider
alongside Azure, which further shifts
OpenAI's partnership dynamic away from a
Microsoft first stance into a multicloud
stance. Oracle has got to be happy about
this. Larry was whistling all the way to
the bank because I think Oracle stop
stock popped 40% at one point. So, he's
the richest man in the world for what
it's worth. But if you actually look at
the unit economics of the Oracle
business, the valuation after the pop is
tough to sustain. I used this as a
chance to check out uh a story we'll get
to later with Claude sort of making uh
models and writing it to Excel. And I
tested that with agent and operator as
well. Both models, both Chad GPT's agent
model and Claude's model were
unanimously concluding that Oracle is
severely overvalued even given the $300
billion cloud deal on a uh net present
value terms. Now, we all know that one,
this is not investment advice, and two,
the market doesn't react rationally to
things. So, I don't know where the
market is going with this. The takeaways
that I have from the deal are one, the
market is so hungry for a continuation
of the picks and shovels line. Uh, this
goes back to the Ashen Brener memo in
2024. The idea that the way to make
money in AI is to have data center
stakes, GPU stakes, the picks and
shovels of the new gold rush. That's the
play Oracle is making to the market.
That's the narrative the market has
bought. Mary Mer, I did a big uh video
summary on Mary Mer a couple months ago
and her deck was heavy on picks and
shovels. This is Wall Street's narrative
for how to make money on AI. Larry is
smart enough to know it and Larry is
playing into that. Meanwhile, for
OpenAI, like I was saying, they're in a
soft divorce with Microsoft and for them
having a multicloud option is really
helpful. Being able to announce a big
deal like this helps them move the ball
forward with the narrative for this
month this week. Sam loves to be in the
news. He loves to have Open AI
positioned as a market leader. Certainly
inking the largest cloud deal in history
counts as being a market leader. All of
this is actually going to play out in
reality by the time we get to chat GPT7
or 8. It is not something that we are
going to feel with any of the current
generation of models because the
beginning of the compute deal isn't for
another year and a half. And so I think
the thing I want to caution is that when
you see these deals, look enough at the
timelines to understand what matters and
why. The last takeaway I have before we
go to the next story is that the fact
that both sides felt good inking a deal
this big for a start date this far out
argues to me that the profits of doom
who claim that we are at the peak of the
AI hype cycle are probably wrong. If
you're willing to ink deals that far
out, you are committed to a compute
budget that requires you to be prepared
for that. Because part of the reason
they have to start the date out, it's
not because they wanted to. It's because
you have to get everything ready so you
can actually operate the compute at that
scale. Like this is a big contract. It
ties into the Stargate plans which
Oracle is also involved in with OpenAI.
And so OpenAI is planning on massive
demand. And this actually comes back to
the cash flow burn rates that they
updated this week as well, which is sort
of a part B of this story. I think they
updated and they have close to $90
billion in new burn rates that they're
expecting. Interestingly enough, they
are projecting or at least on paper, at
least for some investors, a path to
profitability in 2030. And so, at least
on paper, the the idea that OpenAI is
selling here is that they see massive
demand. They see that demand massively
scaling for the next 5 years. And their
expectation is that they will hit
profitability off of the unit economics
associated with that scale. We shall
see. This is one that we will probably
come back to in future weeks, future
news stories. My my my suspicion is my
concern is that the unit economics of AI
have to be worked out and there's two or
three permutations and it's not clear
which one works and I'll leave that as a
question but for example it's not clear
if it's correct to measure profitability
on a per model basis and so a subsequent
model would have a different um a
different profitability number. It's not
clear if it's actually most accurate to
measured on a per data center basis. So
you look at the data center you in
economics but that's not that confusion
is not and and maybe it's gap right like
that's the third one where it's just
like conventional like you get the
revenue you get the cost and you look at
what you bring in and you look at what
it costs you look at your revenue per
customer and the burn per customer and
you see but regardless of what it is
it's not stopping people investing in it
even though it is burning up investor
money at this point like when you update
your burn rate and you say oh by the way
we're going to add $90 billion in burn
that's a pretty significant update on
your burn rate right like It's
non-trivial. So, that's where we're at.
Demand is spiking. Biggest cloud deal in
history. Unit economics still uncertain.
Story number two. This one did not get
reported on as much as I think it should
have. Claude's enterprise memory
revolution. So, Anthropic launched uh
team memory for Claude September 9 to 11
roughly. And this is for enterprises for
teams accounts. It's not just like don't
think of it as chat GPT's memory for
enterprises. It's actually a different
philosophy around AI collaboration and I
want to kind of lay it out for you. So
what makes Claude's approach unique is
that Claude has project isolated memory.
So every claude project on an enterprise
account would have separate memory
contexts and context windows which would
enable you to have confidential client
work and not mix it with general ops
work or with the work of other clients.
It also has much more transparent tool
calling which you and I have probably
already seen if you've worked with
Claude. It's very open about what it
calls and so Cloud's memory works
through very visible function calls like
conversation search or recent chats. Um,
so you can see and understand what's
going on which improves the auditability
and transparency for the enterprise.
Finally, there's something called work
focused context here that I want to talk
about that's really interesting. It
automatically builds persistent profiles
of team workflows, client requirements,
and project specs. And that means that
it is going to start to get to know your
work better over time. So the practical
implications for builders are one if you
were building a claude wrapper or any
kind of AI wrapper for the enterprise
and your breakthrough was uh easy memory
I would be sweating tonight. Uh it is
reminding me again that one of the
things that we see coming through in the
overall pace of AI builds is this focus
on primitives for work. And what I mean
by that is that if you look at the pace
and trend of recent AI adoptions, what
you see is that we are leaning in on
anything that counts as time in the
stack for the workday. So you see this
with the cloud projects and the sort of
memory for cloud that's keeping you in
the cloud ecosystem as you work as a
team. You see it for Excel, you see it
for Word, you see it for PDFs, you see
it for PowerPoint. These are all
connectors that Claude added. You see it
for Claude becoming a personal assistant
on mobile this week where you can
actually connect calendar and Gmail on
the mobile app for Claude and Claude
will effectively act like a personal
assistant. If you're in claude, it can
search your calendar, come back with
recommendations for times like it for
fairly sophisticated things you would
previously use a human for. And so
they're trying to keep you in the work
stack by building primitives. And Chad
GPT is doing the same thing. That's why
they're leaning heavily on codeex now as
a competitor to Claude Code, which by
the way guys, the motion that Claude is
doing here to go beyond Claude code
tells me they are trying to diversify as
codec starts to eat Claude market share.
That's my guess. But anyway, leaning in
on primitives for code, cloud code,
codecs, that's all part of the same
motion. And frankly, I may not be happy
with the quality of implementation, but
chat GPT has been leaning in on the
connectors as well, right? Leaning in on
Excel, leaning in on PowerPoint with
agent mode, etc. Everyone wants you in
the work stack. And so, if you are a
builder, what this means is that you
should not be trying to compete on
primitives. You should be trying to
compete on tools that are more
specialized. Don't try and build Excel
for the office. Bet on somebody grabbing
that primitive unless you are very very
well funded. You've crossed your series
B and you have traction. Well, then it's
a different story, right? It's just it's
not that there are any impossible bets
in business. It's that there are bets
that are harder. And right now competing
for work primitives is competing with
some very very wellunded model makers.
Practical implications that go beyond
sort of where you position. It is going
to be easier to build agent
orchestration workflows in the
enterprise because of features like
this. It is going to be easier to
maintain context across coding session
because of features like this. It is
going to be easier for sales teams to
maintain context across deals, product
teams to maintain specs. This is
something where linear for example is
going to feel a little bit of heat. Not
because and Jira too, right? Because
they are used to being a place where you
record work being done. We're not at the
point yet where any model maker has
rolled out a ticketing system. But we're
also at a point where I wouldn't be too
surprised if they got close to that
because it's such a primitive for
engineering work and because the things
that make ticketing system works are
also things that these model makers are
going after like context like being able
to formulate text and break it out
across specs like being able to handle
technical requirement development etc.
The last thing I want to call out is
that Enthropic is maintaining a
consistent perspective on transparency
and privacy that is going to serve them
well with the enterprise. They've been
really insistent on that from day one.
It is a brand. I'm not even talking
about terms of service. It is a brand
they are maintaining in the marketplace
and the way they chose to roll this out
reinforces that brand. So I am curious
to see how this plays out. There seem to
be competing AI visions here. Chat GPT
seems to be leaning heavily into the
current user base with consumer. They're
also leaning on the code side. They're
also leaning on enterprise deals with
their brand as like the big heavyweight
in the room. Claude is pushing tool
calling really hard and talking a lot
about being a collaborative colleague.
And that tool calling line makes sense.
By the way, there's a GPU implication
here that no one is talking about. So
people don't know this, but technically
speaking, Opus and Sonic, Claude's
models, they are not are not inference
models. And part of why I think is that
Claude doesn't have the GPUs to serve
heavy inference models right now.
They've been more GPU constrained than
OpenAI over the course of their history.
Fine, they're using a large model
instead. If you look at the
parameterization of Opus, it is a big
big big model. And what they're focusing
on is intelligence driven by a big model
for rational tool calling. And it turns
out that is a relatively good bet. Like
they're making the tool calling
transparent. They're letting Opus be the
planner and they're just driving the
ability to solve hard problems through
tools versus through inference, which is
a somewhat more GPU efficient way to do
it if you're constrained. And we all
know like Claude even so still suffers
from GPU brownouts, GPU constraints.
people like that work in Tokyo and
Stockholm say that claude works better
on the off-American hours etc. They have
had troubles with that and so I think
that that's part of why they're leaning
into tools and I think that we will have
to see when they feel comfortable enough
with the compute budget to roll out an
inference model. But what's interesting
is if they do that they may choose to
make the inference model a ondemand
claude opus calls the inference model
when needed almost like a tool kind of
approach because most of what they're
doing here it's a lot of the workday
that they're picking up and handling
through tool calls rather than through
inference and that's pretty efficient
right that makes a lot of sense we've
already talked about in this claw story
the file creation capabilities I did a
whole post on that this week it is a big
big deal getting quality Excel quality
PowerPoint, quality PDF, quality Word
docs, non-trivial pieces of the workday,
hand it over. I don't want you to take
this and think this means that it does
it perfectly. So often when I have these
AI conversations, I feel like we get
trapped in binaries. It's like it's off
or it's on. It was terrible and now it's
great. The right question is, is the
work that Claude is doing useful enough
that I can move much faster as a result?
And Claude is the first model that has
produced work artifacts that meet that
bar and easily meet that bar. I would
not say it is perfect. And I'm not going
to pretend to say it's perfect. And
what's interesting is people say, "Oh,
so that means there's hallucinations." I
actually haven't found that to be the
issue. The issue was more the fit and
the finish and the polish that typically
come with extremely highle Fortune 100
presentations. Claude's not quite there
on the polish and design sort of side of
things. I've actually wondered why Figma
hasn't leaned in on like an AI powered
design thing because I feel like it
would be really easy for Figma to say,
"Do you want design chops? Here's an MCP
server. We're going to bill you." Or
whatever it is. And like you can get
Figma with MCPS for a certain amount of
month and just call design polish into
your your stuff. But that's not the
world we live in, right? Like that's
that's a different world. And Figma
hasn't moved in that direction. And in
the meantime, we do have a real design
gap with AI. The other thing I will call
out is that we don't really know how to
make this transition at work. And that's
very much TBD. You can create documents
really easily, but for teams tomorrow,
for Teams Monday, they have to figure
out and triage if they want to adopt
this. What docs do I put in and edit?
What Excelss do I put in and edit and
start to move through Claude versus what
do I build new in Claude and why? Um,
and I've already had those kinds of
conversations with teams. It's it's
happening already. All right, let's get
to the next story. We we Let's move past
Quad. The FTC launched an AI safety
crackdown. So, the Federal Trade
Commission is launching an AI chatbot
inquiry targeting seven major AI
companies, all the big names, and
they're trying to figure out how to
regulate the industry, particularly
around safety. And so, just to like dig
into that, the seven companies, OpenAI,
Meta, Google, Snap, which is
interesting, Character.ai, and XAI. And
so the companies are going to be
required to provide detailed safety
metrics and monitoring protocols. They
want to focus on protecting children
from potentially harmful AI
interactions. And there could be a new
FTC roll out of compliance requirements
and safety standards across the
industry. This follows on from recent
lawsuits involving chat bots, teen
mental health issues. And basically the
FTC is is signaling like the red line is
making sure that kids are safe and they
will go after companies that they
perceive as potentially not doing enough
in that area or at least that they want
to regulate in that area or that have an
exposure to that kind of experience for
children. We will see where this goes.
For now, my expectation, my basease
expectation is the industry is going to
want to cooperate. the industry is going
to want to self-regulate and there will
probably be some sort of self-regulatory
FTC oversaw regime of some sort that
says these are the standards we have for
protecting minors etc which I think
would be good like it's a step forward
toward actually normalizing this as a
real business a real vertical a real
industry that needs to have proper
safety procedures that everybody agrees
on and there's no real ground rules that
everybody agrees on right now next story
Google AI mode uh so Google expanded AI
mode uh which is its search sort of AI
fancy search beyond English. It now
supports uh other major markets for
Google including Hindi, Indonesian,
Japanese, Korean and Portuguese. And so
this is a Chad GPT like search
experience. It has enhanced shopping
capabilities. You'd better believe
they're looking at Q4 this year for
that. It has in chat checkout. It has
visual tryon features probably powered
by uh the new Nano Banana. And so I look
at this as a real step in the direction
of chat powered commerce. With Fidget
coming on at OpenAI, I have been really
eyeing the idea that like we're going to
have more work done by chat GPT for Q4
this year for ad powered uh checkout
experiences or checkout experiences that
are more ubiquitous in chat GPT. Right
now you can browse products but you
don't sort of complete the checkout but
there's some signals in the code.
They're thinking about that. we are
going to start to see commerce move off
of platforms like Amazon into the chat
experience and my base case expectation
is that the first big season when that's
going to be tried out is Q4 of this year
and so I would be a little bit surprised
if we didn't see multiple major model
makers going for that. I'm guessing
given their branding that Enthropic is
not going to do it for now. Uh but
Google and Chad GPTI would both sort of
expect them to do that. We will see.
Time will tell. The good news is it's
already September, so we're going to
find out in the next month or two how
that goes. So the AI agent market is the
next one. And really like part of the
story here is that people are realizing
how big this market is. So the AI agent
market is now projected to surge roughly
10x in 4 and 1/2 years. So let's call it
5ish billion this year. It's expected to
get who knows at this growth rate, but
between 40 and 50 billion by 2030. um
they'll probably revise it again in the
next few months. Um and what's notable
to me is that uh success rates for AI
agent deployments are going up in 2025
relative to 2 years ago, which if you
work as a builder should not surprise
you. I see many more successful AI agent
projects now than I did last year or the
year before. But if you just read the
headlines, if you read the MIT 95% AI
fail study, you'd think, oh no, I mean
that's useless. Like that's terrible.
I'm sure they all fail. It's not true.
It's not how builders actually see it. I
did a whole piece on this uh on Friday
talking about this idea that the
builders know what's really going on in
AI. And part of how we see that play out
is is this reality on the ground where
AI agent deployments are actually
working better than they were. Along
with that, you have a whole host of new
agent launches. One of the ones that's
interesting is we never talk about
Amazon in the AI space, but they just
keep chipping away. Um and they have
deep pockets and we'll see where they
end up. Amazon introduced quick suite
this week. It merges AWS products with
pre-built workflows for natural language
automation. Basically trying to tack
some agent mode onto existing AWS
product products and and we'll just sort
of see how that goes. Another one that's
interesting, there's a few of these. I
never get them all. They're the new sort
of announcements. Um launched Deepell
Agent. It's an autonomous AI system for
knowledge worker tasks across finance,
sales, and marketing. I always take
these with a big block of salt till I
can start to see them in reality. We
shall see. But they announced it, right?
And you should expect to see more of
these aggressive announcements as we go
forward. Uh in the same vein, a company
called Genesis announced uh A2A agent to
agent collaboration, which is a sort of
a system for enabling agents to work
together without human intervention.
This is going to be one of the really
hot areas in 2026 where we're going to
start to see people say, I have agents
and I want them to self-compose
workflows. I don't want to have to
script the workflows for them. For now,
that remains something that's very
cutting edge and I think that we'll
start to see that move in in the new
year. What else happened? So, there was
a big set of headlines. Again, OpenAI
loves headlines. OpenAI publishes
research identifying the core causes of
AI hallucinations. OpenAI attributes
them to pre-training processes that
prioritize word prediction over
truthfulness. And this was presented in
the headlines as novel and it was
presented as groundbreaking and OpenAI
thought leadership with OpenAI saying
they could see a path to closing out
hallucinations. I I guess I'm really
aging greatly. One of the things that
I've been writing about and I don't want
to pretend I'm the only one. Lots of
people have been talking about for a
long time has been that when you
prioritize in training a single turn
response where the model must generate
text, must be shown as helpful, must
generate detailed information and must
be proactive. You get exactly what you
see today. You see models that are
optimized for a single turn. You see
models that are optimized to give you a
response whether they know the answer or
not. And this leads to hallucinations.
Big surprise. I I don't know why this
was considered novel. Like if if a model
has the choice between telling you the
truth, which is I don't know, and
telling you a really nicely crafted,
good PR value, professional sounding
email with lots of numbers and details.
And the model is rewarded in training
for the latter, not the former. Are you
really shocked that it likes to
hallucinate numbers? That's exactly
what's going on. And OpenAI is
presenting this as if it's like news.
It's not news, guys. is like this is how
we've trained models for a long time.
And part of why is that if you're
building a model for a billion people,
you have to think real seriously about
your engagement rates. If the model
starts saying I don't know or no or this
isn't correct like like models like that
that don't keep you chatting like it
becomes material for OpenAI's business
at a certain point. And I don't want to
say that OpenAI is unwilling to fix a
hallucination problem because of the
engagement rates on the business that
they're doing. I have no evidence to say
that that's ex what is happening. But I
do want to say that the effect is real.
And I want to say that hallucination
root causes are not all that mysterious
the way it's been portrayed. and that it
is actually more useful to think about
hallucinations as a series of different
classes of unwanted behaviors that can
be addressed at both a technical tool
level and also at a system level. But
we're not really talking about it like
that. And ironically, that is probably
what we should be talking about it.
That's how we should be talking about it
if we are trying to address it at the
rollout level for organizations and
leadership. I'm going to be writing more
on that topic I think later this
weekend. There's there's something
around addressing hallucinations as an
organizational problem that isn't being
said and thought about enough. For now,
don't believe everything you hear when
you see a headline like that. The
hallucination cause is well known and I
would be somewhat surprised if I saw a
substantial change in training regimes
because of the benefits of the current
training regimes. The benefits and
engagements the benefits frankly and
some of the things we do want. We want
models that will give us a full detailed
response. Like what if what if
curtailing hallucinations comes at the
pace or the or comes at the cost of
giving you full and detailed answers
when you give the model the information
it needs. Would we take that trade-off?
This gets at what Andre Carpathy has
called out as a fundamental weakness in
LLM training. And maybe we'll end on the
philosophical note here, but one of the
things Andre has called out that I think
is correct is that training is a blunt
reward signal. So if you say yes or no,
the only thing you can do is reward one
of those responses. It's a really blunt
reward signal. So if the model comes
back and says I don't know, you have to
either say that is a good answer or a
bad answer. There's no in between.
There's no nuance. You can't say why.
Similarly, if it comes back with a fully
hallucinated answer with lots of details
that's formatted well that looks
perfect. You can either say it's good or
bad. You can't give it any any nuance
there. And that is part of why I say
eliminating hallucinations may have
negative downstream consequences. Not
just for the engagement case, but also
for situations where you want the model
to be proactive, detailed, fill out full
pieces of information. And because we're
working with effectively blunt
instruments for training, Andre's
pointed out, we have limited flexibility
to help models learn. And models
learning is actually one of the big
unanswered questions in AI. How do we
help models learn? How do they learn
after their release? But also, how do
they learn with much more nuance in
training? I'll leave you with that
question. I hope you've enjoyed the uh
breakdown of news of the