Learning Library

← Back to Library

OpenAI’s Delayed Multimodal Release Strategy

Key Points

  • OpenAI is reverting to an old product‑release playbook, deliberately delaying launches of ready‑to‑ship features to position themselves as “second‑movers” for PR impact rather than serving customers immediately.
  • Google’s recently upgraded Gemini model (dubbed “40”) is now truly multimodal, delivering a distinct image generation engine that leans toward photorealism and interprets localized edit prompts more accurately than OpenAI’s counterpart.
  • OpenAI’s new multimodal model, while more creative and artistic, misinterprets colored‑edit instructions and often applies changes globally, highlighting complementary strengths and weaknesses between the two systems.
  • The narrator stresses that developers should run their own side‑by‑side testing of both models to determine which fits their specific use case, since neither solution is universally superior.
  • Overall, the speaker argues that a genuinely consumer‑focused company should release polished functionality as soon as it’s ready, rather than staging releases around competitor moves.

Full Transcript

# OpenAI’s Delayed Multimodal Release Strategy **Source:** [https://www.youtube.com/watch?v=msHq7IpMh1o](https://www.youtube.com/watch?v=msHq7IpMh1o) **Duration:** 00:07:29 ## Summary - OpenAI is reverting to an old product‑release playbook, deliberately delaying launches of ready‑to‑ship features to position themselves as “second‑movers” for PR impact rather than serving customers immediately. - Google’s recently upgraded Gemini model (dubbed “40”) is now truly multimodal, delivering a distinct image generation engine that leans toward photorealism and interprets localized edit prompts more accurately than OpenAI’s counterpart. - OpenAI’s new multimodal model, while more creative and artistic, misinterprets colored‑edit instructions and often applies changes globally, highlighting complementary strengths and weaknesses between the two systems. - The narrator stresses that developers should run their own side‑by‑side testing of both models to determine which fits their specific use case, since neither solution is universally superior. - Overall, the speaker argues that a genuinely consumer‑focused company should release polished functionality as soon as it’s ready, rather than staging releases around competitor moves. ## Sections - [00:00:00](https://www.youtube.com/watch?v=msHq7IpMh1o&t=0s) **OpenAI vs Google Multimodal Rollouts** - The speaker critiques OpenAI’s repetitive product strategy while highlighting Gemini Flash Experimental’s capabilities and comparing it to Google’s newly upgraded, truly multimodal model “40,” noting distinct quality differences. - [00:03:38](https://www.youtube.com/watch?v=msHq7IpMh1o&t=218s) **OpenAI's Competitor-Driven Release Strategy** - The speaker argues OpenAI times new models like ChatGPT 5 to match rivals rather than prioritizing consumer value, urging a shift toward user‑focused product releases. - [00:07:08](https://www.youtube.com/watch?v=msHq7IpMh1o&t=428s) **Balancing Praise with Product Critique** - The speaker lauds the team's impressive model, cautiously critiques its product strategy, promises a deeper write‑up, and solicits opinions and a comparison between the 40 model and Gemini. ## Full Transcript
0:00Open AAI is back to their old ways. They 0:03are using their old product strategy 0:04playbook to drive releases, which means 0:06that they're releasing second, even when 0:08they have the feature almost certainly 0:11in the can already. So, if you remember 0:14back, Gemini released just last week a 0:16new multimodal art model called uh 0:20Gemini Flash Experimental. Uh they've 0:22labeled it like images in AI Studio 0:25there for Gemini users. It's very very 0:28good. People have been using it to do 0:30product photo shoots where they pull the 0:32product out of someone's hand and they 0:34moodlight it. People have been using it 0:36to do outfit tryons. People have been 0:38using it to literally edit an image with 0:41text. Like you can change the background 0:43of a wall. You can change an object in 0:45the image and it can be photorealistic. 0:48Fantastic model. Well, OpenAI actually 0:51talked about true multimodal where the 0:54model will take text as a first class 0:56input and images as a first class input 0:58and output. And they did that months 1:01ago. And so to have Google beat them is 1:05not something they wanted the public to 1:07think about. And so a week later they 1:09drop their multimodal model. They call 1:11it 40, which you already have. 40 is 40. 1:15But they've upgraded 40 under the 1:17surface with a really really different 1:18image model. It is natively multimodal 1:22now. You can feel the difference. And 1:24what's interesting is these are not 1:27equivalent models. So I asked the exact 1:32same prompt of both of them and I got 1:35really interesting quality differences. 1:37I got a better lean toward photo realism 1:40in Google Gemini. I got a lean toward 1:43creativity and artfulness in 1:45OpenAI. And OpenAI was worse at 1:48interpreting a colored edit suggestion 1:50than Gemini. Gemini correctly understood 1:53I was only referring to an area of the 1:55image, whereas OpenAI assumed I was 1:57referring to the entire background. But 2:01Google Gemini critically misunderstood 2:04the actual composition of what I was 2:06asking for. And OpenAI did not. open 2:10understood what I was actually asking it 2:12to 2:12create. Neither of them is perfect. I'm 2:15using that example to show that you have 2:17to do your own testing and you'll 2:18probably have to try both to see what 2:20you want. I want to take us back to that 2:22strategy piece though. At the end of the 2:25day, it is frustrating to me that 2:27OpenAI, a consumer company, a company 2:29that Sam Alman gave an entire interview 2:32on last week to Stratecher emphasizing 2:36how much of a consumer company they are, 2:38and they're not releasing the great 2:40stuff they have when it's ready for 2:42customers. They're sitting there staring 2:45at at competitors and going second so 2:48that they can try and claim a PR 2:49victory. That's not customer obsessed. I 2:53don't think they should be doing that. I 2:54think if you have it ready, you should 2:56release it. And I think that great 2:59customerfacing companies historically 3:01have done that. When they have the right 3:03product ready, they release it on their 3:06time. And this is a real established 3:10pattern with OpenAI where it's almost 3:12like they're playing inside baseball 3:13with the other model makers. They talk 3:15about being a consumer product and they 3:17are. They have 400 million active users 3:19per month. But in a lot of ways, they 3:22don't act like a grown-up consumerf 3:24facing company yet. They don't have the 3:26consumer obsession that defines 3:29companies like Apple, that defines 3:32companies like Amazon, even Netflix. 3:35Instead, they tend to look at the other 3:38model makers and they can get into a bit 3:39of a standoff. In fact, I strongly 3:41suspect the exact timing of Chat GPT5 3:44and Claude 3 will be 3:47interrelated. Or at the very least, Chat 3:50GPT5's release timing will be tied into 3:53another model released from Google, from 3:55Meta, from Anthropic, maybe even from 3:58DeepSeek. And if that's the case, it's 4:01yet again going to be confirmation that 4:03OpenAI really is looking not to the 4:06consumer to drive their release cadence, 4:08but to other model makers. It's not a 4:12mature product company motion. It's 4:15something that probably is going to need 4:16to 4:17shift because 4:20ultimately the average person doesn't 4:22care who released the image model first. 4:25My grandmother doesn't care whether 4:27Google released it last week or OpenAI 4:29released it this week. She's going to 4:31care if it makes the photo on her phone. 4:34Is it good or not good? And OpenAI has a 4:37huge advantage there. They have the 4:40product surface that most people 4:42familiar with AI understand to use for 4:45AI. Everybody knows Chat GPT and the 40 4:48model is the baseline model. So far so 4:52good. That makes sense. So why worry 4:55about exactly when Google releases 4:57stuff? Why worry about exactly when 4:59Enthropic releases stuff? Why not just 5:02release what you got to your 5:03consumers? That's my hot take. I think 5:06OpenAI is queuing their product strategy 5:10around other competitors incorrectly and 5:13I think they should be focused more on 5:15consumers. I think that's to their 5:16interest as a company long term. And I 5:19think the things they're worried about 5:20sort of losing the PR battle for a cycle 5:23or two are not that big a deal. Like if 5:26you can imagine the reverse where Open 5:28AI released this when it was ready, 5:30maybe a couple of weeks back, maybe a 5:32month back, and then so you know, Google 5:34comes along and and they release theirs 5:36later on. It's not really a PR 5:38difference. It doesn't really make a 5:40difference. And 5:42so I think that we really need to see 5:45some grown-up consumer focused behavior 5:48from some of these model makers now that 5:50we are seeing a much larger consumer 5:53footprint. If we expect AI to be in 5:56people's houses, if we expect it to be 5:57on people's phones, if we expect it to 5:59be a daily hourly touch for people, we 6:02have to act like that when we build and 6:04release products. And right now it's 6:07feeling much more like a why combinator 6:09who releases first very Silicon Valley 6:12insider kind of thing. It's not super 6:14helpful to customers. So that that's my 6:16hot take. But that shouldn't detract 6:18from the fact that this is a great 6:20model. Great people worked on it. The 6:22teams at both of these companies at 6:24Gemini and at OpenAI are fantastic. 6:27They're shipping great stuff. Um and 6:30they should be proud. Like these are 6:31really hard challenges that they're 6:33solving with multimodal. And I think 6:35that we've taken a massive leap forward 6:37on image generation. Again, like to my 6:40video yesterday, it's hard to describe 6:42that unless you're really clear and 6:44specific. Like if you can say before you 6:46could put a Coke can in someone's hand 6:48in a drawing, but then you couldn't move 6:50it around. The Coke can was like frozen 6:53to the hand. And now you can just edit 6:54it and move the Coke can over here with 6:56just your your written text. That makes 6:59the light bulbs go off. Now people 7:01understand. So, I do think we need to 7:04get better at talking about this stuff. 7:06I don't want my critique of the product 7:08strategy to come off as critiquing the 7:10individuals who did the hard work 7:13because this is it's an amazing model. 7:16Like, it's incredible work that they've 7:17done. So, I'll probably write up more on 7:20it later on, but I wanted to throw it 7:22out there. What do you think? Have you 7:24tried the 40 model? How does it compare 7:26to Gemini?