OpenAI's GPT‑04: Near‑AGI, Expensive, Mini
Key Points
- The ARC AGI prize, meant for the first practical artificial general intelligence, wasn’t awarded to OpenAI’s new 03 model despite its 87% human‑level score (above the 85% baseline) because its $2,000‑per‑inference cost makes it impractical today.
- A distilled “03 mini” is expected in early 2024, offering much lower latency and price while retaining most of the capabilities of the full model, illustrating the emerging cycle of breakthrough then rapid, cheaper distillation.
- The 03 architecture no longer behaves like a single LLM; it runs massive Monte‑Carlo‑style simulations that query thousands of language‑model calls to explore many solution paths and select the most probable outcome, similar to AlphaGo’s multi‑engine approach.
- This shift in inference engines signals a new era where AI systems solve complex reasoning tasks by orchestrating many fast, specialized models rather than relying on a single monolithic LLM.
Full Transcript
# OpenAI's GPT‑04: Near‑AGI, Expensive, Mini **Source:** [https://www.youtube.com/watch?v=6GQ9AZdJ-LU](https://www.youtube.com/watch?v=6GQ9AZdJ-LU) **Duration:** 00:05:51 ## Summary - The ARC AGI prize, meant for the first practical artificial general intelligence, wasn’t awarded to OpenAI’s new 03 model despite its 87% human‑level score (above the 85% baseline) because its $2,000‑per‑inference cost makes it impractical today. - A distilled “03 mini” is expected in early 2024, offering much lower latency and price while retaining most of the capabilities of the full model, illustrating the emerging cycle of breakthrough then rapid, cheaper distillation. - The 03 architecture no longer behaves like a single LLM; it runs massive Monte‑Carlo‑style simulations that query thousands of language‑model calls to explore many solution paths and select the most probable outcome, similar to AlphaGo’s multi‑engine approach. - This shift in inference engines signals a new era where AI systems solve complex reasoning tasks by orchestrating many fast, specialized models rather than relying on a single monolithic LLM. ## Sections - [00:00:00](https://www.youtube.com/watch?v=6GQ9AZdJ-LU&t=0s) **OpenAI 03 Prize Controversy & Mini Model** - The speaker outlines how the new 03 model exceeded the ARC AGI prize benchmark yet was denied the award due to prohibitive cost, and previews the upcoming, more affordable 03 mini version. ## Full Transcript
today is the day after open aai
announced their new model 03 I want to
call out five things that I don't think
are widely understood about the model
and I want to talk them through here the
first one is the arc AGI prize which is
a prize that was established for the
first model to reach a practical
artificial general intelligence State
this model is so good 03 is so good they
had to issue a special statement
explaining why they are not going to
award the prize to 03 and spoiler alert
it's not because it's not smart enough
the human Baseline on the current Arc
AGI prize testing Suite is
85% and 03 hit 87 human
equivalency and the reason why they're
not awarding the prize is because they
feel that 03 is too expensive to really
be practical to employ which I think is
fair it's running a ,000 $2,000 a pop
that feels kind of
pricey but it should call out
how we've entered this weird moment
right where we are now in the blurry
beginnings of artificial general
intelligence number two 03 mini is
coming so what people don't realize is
that when they launch a full model it
becomes easier to distill inference down
and get a faster quicker model that has
most of the same
capabilities and So based on early
benchmarking it looks like 03 mini which
will probably come out in January or
February of this year is going to be
vastly cheaper than 01 is right now
better than 01 faster than 01 but of
course not nearly as good as full 03
that's still plenty of intelligence for
most applications you may well see 03
mini in cursor or wind surf or your
development environment of choice in the
first
quarter so we need to start expecting
that we will get this Tik Tock motion
where you have a full cycle of
intelligence breakthrough and then a
distillation cycle that follows it that
distills that inference down in a way
that's not quite as good but is very
very fast and much
cheaper this brings me to number three I
don't think people really understand how
these inference time compute engines
work and calling them an llm is a little
bit deceptive at this point so at the
end of the day they appear to be solving
for deep thinking and deep reasoning
with o03
the way the architecture of alpha go
worked and if you don't know that story
very briefly Alpha go was the computer
program that beat go which is an ancient
game that is very very difficult for
humans to play well and that was
considered harder than chess and
effectively now it's been solved by
Machine learning and the way they did it
was they actually stuck a Monte Carlo
simulation on top of multiple different
uh engines that could run go
so closing that example moving over to
o03 and how it
connects 03 is running multi Monte Carlo
simulations across thousands of calls to
large language models we
think and that means that it can imagine
multiple possible paths to the solution
run them through these calls to the llms
and then come back and pick the one that
it feels has the highest probability and
that's how it's doing insanely difficult
mathematics problems and other things
it's also why it takes a while and it's
also why it's so expensive and I think
that people think of an llm as sort of a
monolithic entity like it's a one thing
right it's a large language model but
that it's like calling Amazon an
e-commerce store right like it's it's an
easy interface stretched over this
gigantic Warehouse Network and insanely
complex Tech in the same way 03 is like
this simple interface stretched across
this insane Patchwork together
simulation thousands of calls LMS Etc
and I think knowing how it works helps
number
four this is really really good at
coding 03 is currently benchmarked as
the number
175th best programmer in the world so is
it better than every single programmer
not yet is it better than 99.99% of us
yes I would put myself at the bottom
there right like I'm not good but like
is it better than almost everybody who
picks up code yes and by the Next
Generation is it going to be number one
quite possibly quite possibly okay this
brings me to number five I do not think
that everybody is going to lose their
job for a very simple reason 98% of the
world doesn't know about this and
cultural change takes time so when the
steam engine was invented it took 150
years to fully realize the impact of the
steam engine across Society we may be
much much faster with AI but even much
much faster is way way slower than
you're probably thinking right now I
walked through an airport
yesterday I guarantee you I was the one
in the airport that was thinking the
most about
AI everybody else around me was acting
like nothing had changed and I knew that
03 had been released I knew what the arc
AGI scores were and nobody was paying
any attention that is going to be the
way it is and the people who understand
Ai and what just happened were going to
feel like a fish out of for a while it's
going to be a very weird year so stick
with me we'll figure it out together I'm
working on some uh scaling laws and
questions around AI that I think that we
need to think about in the light of o03
and I'm going to put those into a longer
substack so there you go cheers good
luck with the weird future we all live
in