Learning Library

← Back to Library

Strawberry 01: Automatic Reasoning Model

Key Points

  • OpenAI unveiled a preview of its new “strawberry” model (named 01, with a faster “mini” variant) less than 24 hours ago, available as a Mac app and a web‑app preview.
  • The 01 model is heavily optimized for reasoning, reportedly solving 83 % of International Math Olympiad‑style problems versus roughly 40 % for the previous ChatGPT version.
  • Under the hood, OpenAI added a temporal dimension to token generation, allowing the model to perform automatic chain‑of‑thought reasoning without needing explicit “think step‑by‑step” prompts.
  • This new architecture introduces hidden “reasoning tokens” that consume part of the token budget while the model takes extra, elastic time to infer, then discards them before returning the final answer.
  • The ability to allocate extra time for inference gives the strawberry model a long‑term advantage in accuracy and depth of thought compared to earlier LLMs.

Full Transcript

# Strawberry 01: Automatic Reasoning Model **Source:** [https://www.youtube.com/watch?v=oQrSoXg5Q4I](https://www.youtube.com/watch?v=oQrSoXg5Q4I) **Duration:** 00:09:02 ## Summary - OpenAI unveiled a preview of its new “strawberry” model (named 01, with a faster “mini” variant) less than 24 hours ago, available as a Mac app and a web‑app preview. - The 01 model is heavily optimized for reasoning, reportedly solving 83 % of International Math Olympiad‑style problems versus roughly 40 % for the previous ChatGPT version. - Under the hood, OpenAI added a temporal dimension to token generation, allowing the model to perform automatic chain‑of‑thought reasoning without needing explicit “think step‑by‑step” prompts. - This new architecture introduces hidden “reasoning tokens” that consume part of the token budget while the model takes extra, elastic time to infer, then discards them before returning the final answer. - The ability to allocate extra time for inference gives the strawberry model a long‑term advantage in accuracy and depth of thought compared to earlier LLMs. ## Sections - [00:00:00](https://www.youtube.com/watch?v=oQrSoXg5Q4I&t=0s) **ChatGPT 01 Strawberry Model Launch** - The speaker reviews the newly released ChatGPT 01 “strawberry” model and its faster Mini variant, highlighting their heightened reasoning focus, superior math‑olympiad performance, and a novel time‑augmented token completion architecture that automates chain‑of‑thought processing. ## Full Transcript
0:00it's been less than 24 hours since Chad 0:03GPT dropped their new strawberry model 0:05or more precisely a preview of what will 0:09be their new strawberry model they had 0:11been rumoring this for October and lo 0:13and behold on September 12th they 0:16dropped the 01 model they really have to 0:18get better at naming these things and 0:21it's available in two forms in apps if 0:24you're on the Mac or in the web app 0:27preview as well it's 01 preview it's 0:29also mini which is supposed to be faster 0:31at reasoning both of them are focused 0:33heavily on reasoning they are touting 0:36the benefits of 01 for things like 0:38really hard math problems which I got to 0:40tell you I do not do a lot of at work so 0:42apparently International math Olympiad 0:45problems which I bet I couldn't solve uh 0:47it can solve 83% of the time whereas 0:50chat GPT 40 only gets a 133% score maybe 0:54that's more like me I don't know I 0:55haven't taken a math Olympiad problem in 0:57a little bit um so 1:00that's the basic idea it does a lot of 1:02reasoning I think the architecture 1:03underneath is part of what makes that 1:05model feel so different if you've worked 1:07with it at all you can feel when you 1:09talk to it how different it is so what's 1:12happening under the hood is that they 1:15have taken the idea of token completion 1:18and added a time element to it and they 1:21actually include charts that talk about 1:24this as they release the model and so 1:26fundamentally what 1:28happens is that the model goes back and 1:31automatically does that Chain of Thought 1:33work that you were taught to tell the 1:35llm to do if you were studying prompting 1:38so remember how up to yesterday you had 1:41to tell large language models explain 1:44yourself think thoughtfully I think the 1:45last time I did that with Claude was 1:47yesterday like it's just what you do if 1:49you want a very careful precise 1:51thoughtful answer you no longer have to 1:53do that with strawberry with the 01 1:56model it just does it automatically and 1:59it does it because it takes time to 2:01answer and it's not just time to go back 2:03and like retrieve something that matches 2:06out of a vectorized uh training set and 2:08come back with a feature that that 2:09aligns I'm I'm assuming you know what a 2:12feature is a feature is a feature inside 2:15a neural network and it goes back and 2:17matches against it in the traditional 2:19llm and then it comes back out and 2:20there's the output that's not how this 2:23works fundamentally there's an extra 2:25step in there that takes time and 2:26they're giving it elastic time to work 2:28with and that's a huge ad Advantage long 2:30term because it can choose to take more 2:32time to answer your response it's not 2:34trying to go as fast as it can it can 2:36choose to take time for inferring and 2:39what it what open AI says is that it's 2:42actually doing reasoning during that 2:44time and so there's this New Concept 2:46called reasoning tokens which are hidden 2:49from the user most of the time and you 2:52have to put them in your max token limit 2:54and it will use the reasoning tokens to 2:56solve problems and then discard them 2:59before giving you back your output and 3:01so open AI has chosen not to make that 3:05reasoning visible to the user I supect 3:08it helps because it's a closed model and 3:10they want to keep it closed and they 3:12want to keep it proprietary they don't 3:13want to explain exactly how they're 3:14doing the reasoning this is a 3:16significant 3:17breakthrough anyway regardless they keep 3:20the reasoning out of view and they 3:24discard the tokens and then they put a 3:26response in front of you but in my 3:28experience playing with this they still 3:30explain their work so so the model still 3:33tells you what it did it just doesn't 3:34give you the granular detail of the 3:36reasoning that it technically could if 3:38it actually threw the reasoning output 3:40at you and so it will give you a summary 3:43let me give you an example I asked the 3:45model to calculate the number of people 3:49who work in Tech who also like Legos 3:53just for fun just to see like what it 3:54would do with it and it would require it 3:56like thinking through in order like what 3:57assumptions to use 4:00what population estimates to use what 4:02percentage estimates to use for people 4:03who like playing with Legos as adults 4:06and it came back and generated an entire 4:09response apparently there are uh a few 4:12million of us at least according to that 4:14estimate two and a half million people 4:15anyway 4:17um and when it came back and sort of 4:21revealed the estimate it did describe in 4:25plain English the way it had thought 4:27through things it just did so at a very 4:28high level so it did things like say 4:30this is a rounded estimate this is what 4:33I think based on Census Data like it 4:35talked a little bit about how it was 4:37inferring I could see that again when I 4:40gave it a different challenge I gave it 4:41a challenge of looking at 50 assorted 4:44job titles and only based on those job 4:47titles determine the average salary of 4:51the job titles in the sample right so if 4:53you have 50 assorted job titles that's 4:55the only information you get what is the 4:56average salary that would take a human 5:00hours perhaps days at least 5:03hours it did it in I want to say 20 5:06seconds and it showed its reasoning so 5:08it basically went through and it built a 5:10table and it went through and fetched a 5:12reputable source for job title 5:14information out of its training set I 5:16don't think it's connected to the 5:17internet yet and it listed this is the 5:20salary on average that I get for each of 5:22these job titles and it showed its work 5:24so I could see and check it and like I 5:26put CEO in there and the CEO salary was 5:28very low uh so it's not perfect but it 5:31gave me an estimate and then it walked 5:33me back through and then it did an 5:34average and the average is actually the 5:37average and that is something that when 5:39you're just doing next token prediction 5:41you are not doing math and so one of the 5:43notorious problems with mathematics with 5:45traditional llms is they're just 5:48predicting the next token and so they 5:50can predict the next token incorrectly 5:51and there's no way for them to sort of 5:53catch that internally and now it's 5:55reasoning and now it's thinking with 5:57those reasoning tokens and and is 5:59checking its work I don't know how they 6:02did that either it took a team of like 6:0460 people at open AI to figure this out 6:07over the course of a year it's a massive 6:09achievement this is the first model that 6:10has really broken through the gp4 class 6:14barrier now here's the thing it's not 6:16perfect it is still slower than the 6:20other models and it's intended to be 6:22slower in fact it's kind of a badge of 6:25honor to make it think for a while 6:26because it means you gave it a hard 6:27problem it has a token it has a rate 6:30limit for asking questions so it's like 6:32what 20 30 questions a week so this is 6:34not going to be your everyday model it's 6:36very much in preview mode because it's 6:38computationally 6:40expensive but to be honest with you how 6:43often am I facing problems that are this 6:45complex for example I fed it a pricing 6:48structure question that I've been having 6:50I got a novel response back I don't know 6:52that I entirely like the response but it 6:54actually laid out a pricing structure in 6:56great detail in response to a detail 6:58prompt from me about a tough B2B SAS 7:01pricing problem and it gave me a really 7:03reasonable response it gave me something 7:05new to think about I don't really need 7:07to go back and ask it again about that 7:09because that's a problem that is sort of 7:11a need to think about it need to go back 7:13and have some conversations it's not a 7:15rush answer and I think there's going to 7:17be more of those kinds of questions and 7:19so I'm not sure in practice how much the 7:22question limit matters to a user in the 7:25wild now if you're testing it it's going 7:26to drive you nuts uh because you're 7:28going to run into it right away 7:30and that brings me to my last point 7:32which is that we are already at a point 7:35where we have these sort of raggedy 7:37edged models that are super super good 7:40at certain things and we have to figure 7:42out what we're using them for and that's 7:43sort of on us it comes back to this idea 7:46of intelligence allocation if I want to 7:48do text work I don't know that I'm going 7:50to this model to do text work I might 7:52stick with Claud Sonet if I want to do 7:55multi-step reasoning I'm absolutely 7:57using the 01 7:59and if I want to describe basic concepts 8:01very quickly jet GPT 40 is going to be 8:04really 8:05useful you have to get a feel for these 8:07models by playing with them to figure 8:09out what they're actually going to be 8:10good 8:11at 8:13and that's on us it's on us to try it's 8:16on us to figure out I'm really curious 8:17what are you using the 01 model for I 8:19haven't even touched on the coding side 8:21I did get it to write a little bit of 8:22code for me it was extremely fast it was 8:24clear it is not it is really bothering 8:27me after repet after cursor it's not 8:29have this thing working in a development 8:31environment uh inside the app I know you 8:34can call it in there like obviously you 8:36can get the API for 01 preview I think 8:38that's in beta now and so it's going to 8:40be in development environment 8:42soon but if you're just trying to sort 8:45of code in the browser and it throws a 8:47code snippet at you like it's just not 8:49the best experience so I'm sure we'll 8:51get that solved right it'll get an API 8:52we'll get that in what are you using it 8:55for what do you think what did I miss 8:57about strawberry strawberry is here all 8:59right cheers