Learning Library

← Back to Library

OpenAI's GPT‑04: Near‑AGI, Expensive, Mini

5m • Unknown Channel • ai-ml • deep-dive • advanced • Watch on YouTube ↗

Key Points

The ARC AGI prize, meant for the first practical artificial general intelligence, wasn’t awarded to OpenAI’s new 03 model despite its 87% human‑level score (above the 85% baseline) because its $2,000‑per‑inference cost makes it impractical today.
A distilled “03 mini” is expected in early 2024, offering much lower latency and price while retaining most of the capabilities of the full model, illustrating the emerging cycle of breakthrough then rapid, cheaper distillation.
The 03 architecture no longer behaves like a single LLM; it runs massive Monte‑Carlo‑style simulations that query thousands of language‑model calls to explore many solution paths and select the most probable outcome, similar to AlphaGo’s multi‑engine approach.
This shift in inference engines signals a new era where AI systems solve complex reasoning tasks by orchestrating many fast, specialized models rather than relying on a single monolithic LLM.

Sections

00:00:00 OpenAI 03 Prize Controversy & Mini Model - The speaker outlines how the new 03 model exceeded the ARC AGI prize benchmark yet was denied the award due to prohibitive cost, and previews the upcoming, more affordable 03 mini version.

Full Transcript

# OpenAI's GPT‑04: Near‑AGI, Expensive, Mini **Source:** [https://www.youtube.com/watch?v=6GQ9AZdJ-LU](https://www.youtube.com/watch?v=6GQ9AZdJ-LU) **Duration:** 00:05:51 ## Summary - The ARC AGI prize, meant for the first practical artificial general intelligence, wasn’t awarded to OpenAI’s new 03 model despite its 87% human‑level score (above the 85% baseline) because its $2,000‑per‑inference cost makes it impractical today. - A distilled “03 mini” is expected in early 2024, offering much lower latency and price while retaining most of the capabilities of the full model, illustrating the emerging cycle of breakthrough then rapid, cheaper distillation. - The 03 architecture no longer behaves like a single LLM; it runs massive Monte‑Carlo‑style simulations that query thousands of language‑model calls to explore many solution paths and select the most probable outcome, similar to AlphaGo’s multi‑engine approach. - This shift in inference engines signals a new era where AI systems solve complex reasoning tasks by orchestrating many fast, specialized models rather than relying on a single monolithic LLM. ## Sections - [00:00:00](https://www.youtube.com/watch?v=6GQ9AZdJ-LU&t=0s) **OpenAI 03 Prize Controversy & Mini Model** - The speaker outlines how the new 03 model exceeded the ARC AGI prize benchmark yet was denied the award due to prohibitive cost, and previews the upcoming, more affordable 03 mini version. ## Full Transcript

0:00today is the day after open aai 0:02announced their new model 03 I want to 0:04call out five things that I don't think 0:06are widely understood about the model 0:08and I want to talk them through here the 0:10first one is the arc AGI prize which is 0:13a prize that was established for the 0:15first model to reach a practical 0:17artificial general intelligence State 0:19this model is so good 03 is so good they 0:23had to issue a special statement 0:26explaining why they are not going to 0:28award the prize to 03 and spoiler alert 0:31it's not because it's not smart enough 0:34the human Baseline on the current Arc 0:35AGI prize testing Suite is 0:3885% and 03 hit 87 human 0:43equivalency and the reason why they're 0:45not awarding the prize is because they 0:48feel that 03 is too expensive to really 0:51be practical to employ which I think is 0:53fair it's running a ,000 $2,000 a pop 0:55that feels kind of 0:57pricey but it should call out 1:00how we've entered this weird moment 1:03right where we are now in the blurry 1:04beginnings of artificial general 1:07intelligence number two 03 mini is 1:10coming so what people don't realize is 1:13that when they launch a full model it 1:15becomes easier to distill inference down 1:18and get a faster quicker model that has 1:20most of the same 1:22capabilities and So based on early 1:25benchmarking it looks like 03 mini which 1:27will probably come out in January or 1:29February of this year is going to be 1:32vastly cheaper than 01 is right now 1:34better than 01 faster than 01 but of 1:38course not nearly as good as full 03 1:40that's still plenty of intelligence for 1:42most applications you may well see 03 1:45mini in cursor or wind surf or your 1:48development environment of choice in the 1:50first 1:51quarter so we need to start expecting 1:55that we will get this Tik Tock motion 1:57where you have a full cycle of 2:00intelligence breakthrough and then a 2:02distillation cycle that follows it that 2:04distills that inference down in a way 2:06that's not quite as good but is very 2:07very fast and much 2:09cheaper this brings me to number three I 2:12don't think people really understand how 2:15these inference time compute engines 2:18work and calling them an llm is a little 2:21bit deceptive at this point so at the 2:23end of the day they appear to be solving 2:27for deep thinking and deep reasoning 2:29with o03 2:30the way the architecture of alpha go 2:33worked and if you don't know that story 2:34very briefly Alpha go was the computer 2:37program that beat go which is an ancient 2:41game that is very very difficult for 2:42humans to play well and that was 2:45considered harder than chess and 2:47effectively now it's been solved by 2:48Machine learning and the way they did it 2:51was they actually stuck a Monte Carlo 2:53simulation on top of multiple different 2:58uh engines that could run go 3:01so closing that example moving over to 3:03o03 and how it 3:04connects 03 is running multi Monte Carlo 3:08simulations across thousands of calls to 3:11large language models we 3:13think and that means that it can imagine 3:16multiple possible paths to the solution 3:19run them through these calls to the llms 3:22and then come back and pick the one that 3:23it feels has the highest probability and 3:26that's how it's doing insanely difficult 3:28mathematics problems and other things 3:30it's also why it takes a while and it's 3:32also why it's so expensive and I think 3:34that people think of an llm as sort of a 3:35monolithic entity like it's a one thing 3:37right it's a large language model but 3:39that it's like calling Amazon an 3:42e-commerce store right like it's it's an 3:44easy interface stretched over this 3:45gigantic Warehouse Network and insanely 3:47complex Tech in the same way 03 is like 3:51this simple interface stretched across 3:53this insane Patchwork together 3:56simulation thousands of calls LMS Etc 4:00and I think knowing how it works helps 4:02number 4:03four this is really really good at 4:05coding 03 is currently benchmarked as 4:08the number 4:10175th best programmer in the world so is 4:13it better than every single programmer 4:16not yet is it better than 99.99% of us 4:20yes I would put myself at the bottom 4:22there right like I'm not good but like 4:24is it better than almost everybody who 4:26picks up code yes and by the Next 4:29Generation is it going to be number one 4:31quite possibly quite possibly okay this 4:34brings me to number five I do not think 4:38that everybody is going to lose their 4:39job for a very simple reason 98% of the 4:42world doesn't know about this and 4:44cultural change takes time so when the 4:47steam engine was invented it took 150 4:50years to fully realize the impact of the 4:52steam engine across Society we may be 4:54much much faster with AI but even much 4:57much faster is way way slower than 5:00you're probably thinking right now I 5:02walked through an airport 5:04yesterday I guarantee you I was the one 5:07in the airport that was thinking the 5:08most about 5:10AI everybody else around me was acting 5:12like nothing had changed and I knew that 5:1503 had been released I knew what the arc 5:18AGI scores were and nobody was paying 5:20any attention that is going to be the 5:22way it is and the people who understand 5:25Ai and what just happened were going to 5:28feel like a fish out of for a while it's 5:31going to be a very weird year so stick 5:33with me we'll figure it out together I'm 5:36working on some uh scaling laws and 5:39questions around AI that I think that we 5:41need to think about in the light of o03 5:43and I'm going to put those into a longer 5:44substack so there you go cheers good 5:48luck with the weird future we all live 5:49in