Learning Library

← Back to Library

AI Infrastructure Wars and Cost Curve

28m • Unknown Channel • ai-ml • news • intermediate • Watch on YouTube ↗

Key Points

The latest Airst Street Capital “State of AI” report declares that the era of competing purely on model intelligence (model‑IQ) is ending, ushering in the “infrastructure wars” where system design and cost efficiency dominate.
Three forces will now drive AI success: the rapidly improving capability‑to‑cost curve, how AI is distributed to users, and the physical infrastructure needed to run models.
AI “intelligence per dollar” is doubling far faster than most anticipate—approximately every 3‑8 months across major providers, outpacing Moore’s Law by three‑to‑seven‑fold and dramatically reshaping unit economics.
Winners will be firms that can dynamically route workloads to the cheapest model that meets performance needs, leveraging the fast‑shrinking cost curve to unlock real‑world value.

Sections

Full Transcript

# AI Infrastructure Wars and Cost Curve **Source:** [https://www.youtube.com/watch?v=gRhOo6uT-fM](https://www.youtube.com/watch?v=gRhOo6uT-fM) **Duration:** 00:28:30 ## Summary - The latest Airst Street Capital “State of AI” report declares that the era of competing purely on model intelligence (model‑IQ) is ending, ushering in the “infrastructure wars” where system design and cost efficiency dominate. - Three forces will now drive AI success: the rapidly improving capability‑to‑cost curve, how AI is distributed to users, and the physical infrastructure needed to run models. - AI “intelligence per dollar” is doubling far faster than most anticipate—approximately every 3‑8 months across major providers, outpacing Moore’s Law by three‑to‑seven‑fold and dramatically reshaping unit economics. - Winners will be firms that can dynamically route workloads to the cheapest model that meets performance needs, leveraging the fast‑shrinking cost curve to unlock real‑world value. ## Sections - [00:00:00](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=0s) **Beyond Model IQ: Infrastructure Wars** - The AI State of the Industry report argues that the era of chasing higher model intelligence is ending, and future dominance will hinge on three system-level forces—capability‑to‑cost curves, distribution strategies, and physical infrastructure—favoring firms that can route tasks to the cheapest capable models. - [00:03:36](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=216s) **AI Cost, Funding, and Browser Distribution** - The speaker explains how massive token‑scale usage drives cost‑per‑token optimization, how model release schedules now align with fundraising cycles, and how browsers are emerging as the default AI operating system, linking capability growth, cost reduction, and distribution shifts. - [00:06:51](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=411s) **Answer Engine Optimization & Google Dynamics** - The speaker explains how AI answer engines rely on Google's index, creating a need for new Answer Engine Optimization practices—structured data, APIs, citation‑friendly formats—and highlights Google's strategic dilemma of powering competitors while shifting users to its own AI interfaces. - [00:10:35](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=635s) **Water Constraints on AI Scaling** - The speaker warns that as AI usage grows to quadrillion‑token volumes, the water needed for data‑center cooling becomes a hard limiting factor that will dictate site locations, power strategies, and the practical viability of large‑scale AI deployments. - [00:15:36](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=936s) **Model Sycophancy and Reward Gaming** - The speaker warns that as AI models become smarter they may learn to flatter human evaluators and game reinforcement signals, which can offset intelligence gains and create new scaling challenges. - [00:18:47](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=1127s) **Open Weights vs Frontier Closed Models** - The speaker argues that while the most advanced frontier models remain closed in the US, partially or fully open‑weight models can still deliver competitive capability, lower costs, customization, and sovereignty, making them valuable for hybrid enterprise architectures alongside proprietary cloud offerings. - [00:24:52](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=1492s) **Choosing AI Models for Workflow Distribution** - The speaker stresses evaluating distribution capabilities, infrastructure constraints, and workflow efficiency when selecting AI models, marking a shift from merely pursuing smarter models to strategically leveraging differentiated skills and model availability. - [00:28:14](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=1694s) **Rare Insight on AI Strategy** - The speaker emphasizes the uniqueness of the audience's attention to current AI strategic themes and predicts a tumultuous 2026. ## Full Transcript

0:00The state of AI report is finally out. 0:02This is an annual report from Airst 0:04Street Capital with Nathan Benich at the 0:06lead. It's been published for the last 0:07eight years. Every time it comes out, it 0:10shifts the industry. I'm going to go 0:11through and I'm going to summarize the 0:13313 slides in just a few minutes so that 0:17you get the TLDDR. So, first things 0:19first, the takeaway is that the model IQ 0:22contest is over and the infrastructure 0:25wars are just beginning. So the thesis 0:28is fundamentally we have been pushing 0:30and pushing and pushing to make 0:32incrementally smarter models. But now 0:34what really matters is systems 0:37specifically three compounding forces 0:39that drive those systems. The capability 0:41to cost curve, the distribution 0:44question, and the physical 0:46infrastructure question. We're going to 0:47get to all three of those, but the 0:49thesis of this entire 313 slide report 0:53is that those three drivers are going to 0:55matter more in terms of practical AI 0:58than just model IQ. And so the winners 1:00in the race are going to be the ones 1:02that can route computational work to the 1:04cheapest capable model rather than just 1:07defaulting to frontier smart IQ options. 1:09And that's going to enable them to get 1:11to real value. So let's jump right in. 1:14the capability to cost curve. What is 1:16it? So perhaps the most consequential 1:19thing that we aren't talking about and 1:21this is the the economic finding the 1:23report has. Our intelligence per dollar 1:26is improving on an exponential curve 1:29that is faster than the pace that most 1:32people assume in their strategic plans. 1:34So across two independent leaderboards, 1:36artificial analysis which tracks API 1:39pricing and performance and also LM 1:41arena which tracks crowdsource model 1:42rankings, the capability to cost curve 1:45is doubling very very frequently roughly 1:47every four or five months. The average 1:50across all of the different measures and 1:52different model makers is between 3 and 1:548 months to double to be clear. So 1:57Google is at a 3.4 month doubling time, 2:00the fastest improvement curve in the 2:02ecosystem. Open AAI is at a 5.8 8month 2:04doubling time. Google in the LM arena 2:07scores is slightly slightly longer, 5.7 2:10month doubling time versus 3.4 in the 2:12artificial analysis. And so this is why 2:14I give you a range, right? It gives you 2:15a sense of how fast it is. It's 2:17ridiculously fast. So for context, 2:20Moore's law predicted transistor density 2:22doubling 18 to 24 months and that 2:24roughly held give or take a few months 2:26for a very long time, many many decades. 2:28We are seeing effective AI capability 2:31per dollar double three to seven times 2:35faster than Moore's law. And the pricing 2:37evidence is compelling. T5's input 2:40GPT5's input costs for a 400,000 token 2:44context window are 12 times cheaper than 2:47Claude, 24 times cheaper than GPT4.1. 2:51It's not that this is marginal, right? 2:53This resets unit economics every few 2:55months. When you can obtain frontier 2:58adjacent performance for a 20th of the 3:00price of just 6 months ago, then you 3:03have a lot of strategic implications 3:04that start to fall out of that 3:06fundamental cost curve insight. First, 3:09routing is now a competitive advantage, 3:11not model quality. So products that 3:13intelligently triage requests and send 3:16simple queries to small language models 3:18and reserve expensive frontier calls for 3:20when they need it. They're going to 3:21capture margin in a way that monolithic 3:23architectures can't. And so the 3:25practical AI stack now looks a lot like 3:28smaller, dumber first routing with 3:30frontier spikes only where needed. 3:32Second ecosystem as a whole is scaling 3:36usage with the cost coming down. We're 3:38processing about a quadrillion tokens 3:40every month across different API 3:42providers. And at that scale, even a 3:44basis point improvement in routing 3:46efficiency will translate into millions 3:49of dollars in cost savings or expanded 3:51margin. So cost per token and latency 3:53are not just back-end concerns, they 3:55become relevant to product 3:57differentiation to the P&L. Third, model 4:00release cadences now correlate directly 4:03to fundraising cycles. And what's 4:05interesting is that this makes road 4:07mapaps financial instruments, right? So 4:09open AI trails model release to fund 4:13raise by about 77 days. Google is about 4:1650 days. And so labs will time 4:19capability releases to create momentum 4:22for funding rounds. And investors should 4:25read launch announcements as 4:26preundraising signals rather than purely 4:29technical milestones. This is relevant 4:31not just for open AI, not just for 4:33Google, but for others, for anthropic, 4:36even for folks outside the core model 4:38makers. People are announcing 4:41capabilities in the AI race as a way of 4:44warming up the market for a fund raise. 4:46All of this adds up to a world where 4:50capability is continuing to accelerate 4:53even as cost comes down. And that's the 4:55first major driver, capability to cost 4:58compounding. The second major driver is 5:00distribution. Distribution is tilting 5:03toward answer engines in the browser. 5:06And I want to get specific about this. 5:07The browser is becoming the operating 5:09system for AI by default. And the 5:11distribution choke point is shifting 5:14from the search box like Google to 5:16answer engines that can parse and 5:18synthesize and present information 5:20before the user clicks through. So Chad 5:22GPT search is the gorilla. They claimed 5:24800 million weekly active users last 5:27week. They have very roughly, we're 5:29still learning how to measure this, 5:30something like 60% share of the AI 5:32search market. And as a comparison, if 5:34you're wondering about Perplexity, which 5:36is the most famous AI search engine, it 5:38logged about 780 million queries in May 5:42of 2025, and it's growing 20% month 5:44overmonth. So, it's going to be over 5:46over a billion monthly queries. Now, 5:49that is competitive traction, but it's 5:51dwarfed by OpenAI's distribution 5:54advantage. Open AAI with 800 million 5:56weekly active users has much much more 5:59on the table when it comes to search. 6:01What's interesting is it's not just 6:03search and conversation. It's also 6:05purchase intent. So retail conversions 6:08from AI referrals are running at about 6:1111% 5 percentage points year-over-year. 6:14And that 11% conversion rate is very 6:17strong historically. Typical organic 6:19search is just not going to keep up. 6:22It's much higher than typical organic 6:24search. It's competitive with paid 6:26search conversion in many verticals. So, 6:29answer engines aren't just changing how 6:30people find information. They are 6:33driving forward a new vertical of 6:36purchase that is going to become hyper 6:38relevant for e-commerce and for other ad 6:42providers, marketers in 2026. But 6:44there's a dependency that we're not 6:46talking about. 6:48The answer engine still source really 6:51heavily from Google's index. They're not 6:53crawling the web independently at scale 6:56yet. They're layering natural language 6:58synthesis over the top of existing 7:01search infrastructure. And that creates 7:03a really weird dynamic where Google is 7:05providing the index, but OpenAI and 7:07others are capturing the intent and 7:09conversion. This has some strange 7:11implications for builders. If you're not 7:13thinking about answer engine 7:14optimization as a topic, you you need to 7:16be because you don't want to be 7:17invisible to the fastest growing 7:18distribution channel that we have. But 7:21AEO requires something different from 7:24traditional SEO. You have to have 7:25structured data schemas that models can 7:28parse and understand. You have to have 7:30APIs that will allow answer engines to 7:32pull canonical information directly. 7:35You'll need content architecture that's 7:36designed for extraction and synthesis, 7:38not just for keyword targeting. You're 7:40going to need citationfriendly 7:41formatting that's going to make 7:43attribution really clear. And Google 7:45faces a really tricky strategic tension. 7:48It provides the index that powers 7:50competitors answer engines. But 7:52capturing that value requires Google to 7:54transition users from search to its own 7:56AI interfaces without cannibalizing its 7:59traditional monetization model. That 8:01will be one of the central questions of 8:042026. Third major driver, power and 8:07permits. This is a hard constraint on 8:10scale that we're all going to be facing 8:11in AI. So we all have heard about the 8:14Stargate project, a 10 gawatt target 8:16power, half a trillion dollar 8:18investment. Multiple labs are now 8:20targeting 5 gawatt training clusters 8:22operational by 2028. But physical 8:25infrastructure has to get there to 8:28enable that kind of AI progress. This is 8:30not a temporary bottleneck. It's a 8:32capitalintensive, frankly geopolitically 8:35complex problem space that will 8:37determine which organizations can 8:39execute on their road maps and that will 8:41drive success for the organizations that 8:45are able to build. Just to give you a 8:47sense, a single gigawatt data center 8:49requires about $50 billion in capital 8:53expenditure right now. Land, buildings, 8:55cooling, networking, GPUs, etc. And it's 8:58going to require 11 billion a year fully 9:00loaded to operate. So electricity, 9:03maintenance, staffing, interconnects, 9:05etc. That's not cheap, right? For 9:07perspective, a single gigawatt data 9:09center consumes the equivalent power of 9:12a midsize city. And so the US currently 9:14faces an implied 68 gawatt power 9:18shortfall by 2028. That's 68 citysiz 9:21data centers that we think will be short 9:23per forecast forecast cited by semi 9:25analysis and corroborated by the North 9:27American Electricity Reliability 9:29Corporation. So one of the challenges 9:31that we're facing given that gap is 9:34figuring out where and how we can 9:36actually build to cover it. And so 9:38things that have traditionally been sort 9:40of environmental debates like not in my 9:42backyard opposition nimiism have become 9:46geopolitically relevant AI debates. 9:48Right? Nimiism has already blocked 64 9:52billion dollars in data center projects 9:54across the US. Local communities are 9:56voicing their opinion. They don't want 9:57approvals due to concerns about grid 9:59strain or noise or water usage or 10:01whatever the local concern is. And 10:03they're voicing their complaints in ways 10:06that affect build patterns at the 10:08county, municipal, and state levels. We 10:10don't know how this is going to play 10:12out. But the fact that the constraint 10:14exists, that it plays out differently in 10:16different local communities is going to 10:18shape our collective future. Water adds 10:21another layer of constraint. A 100 10:23megawatt data center, so it's smaller, 10:25consumes about 2 million L a day in 10:28cooling. Now, per text query, per ask, 10:31this is a very small amount of water. 10:33The typical Gemini text prompt 10:35apparently consumes about a quarter of a 10:37milliliter of water. a quarter of a 10:38milliliter, tiny amount. But when you 10:40get to quadrillion token per month 10:42scale, water usage becomes a sighting 10:45constraint, especially in droughtprone 10:48regions because data centers are going 10:50to be competing with agriculture, 10:52potentially with residential use for 10:53allocation rights. This shifts where you 10:56can site your data centers. And so labs 10:59and cloud providers are being forced to 11:01pursue special uh behind-the- meter 11:03power purchase agreements. They're 11:05trying to find ways to get to offshore 11:08jurisdictions that have more available 11:10power and perhaps fewer permitting 11:12obstacles. Norway comes to mind, the UAE 11:14comes to mind for that. And they're 11:16trying to design for water aare cooling 11:19things like air cooling, waste heat 11:20recovery. The larger implication is if 11:23your AI road map assumes that you can 11:25call an API and scale from a million to 11:28100 million users on demand and repric, 11:30you need to think through what that's 11:33going to take. We all need to be aware 11:36that this hard constraint is going to 11:39shape the availability of the rest of 11:41the stack. It's going to shape the 11:43availability of software. It's going to 11:45shape the availability of tokens. Now, 11:48there are very smart people working to 11:49make things more efficient, working to 11:51solve the power problem. Small modular 11:53nuclear reactors come to mind. But 11:55there's a difference between working on 11:57it and having it be operational. And so 11:59if you look at the strategic read and we 12:01ladder back to where we were at the 12:02start of this conversation, 12:04fundamentally the next year or two are 12:07going to see this clash between 12:09dramatically improving intelligence per 12:13cost opportunities like we said that 12:15that's the the capability to cost 12:17improvements are doubling every few 12:19months. This is going to lead to more 12:20and more demand for tokens. But at the 12:23same time we have real hard constraints. 12:26It is difficult to build these data 12:28centers in physical space. It's it's not 12:30a bytes problem. It's not a bits 12:32problem. It's an atoms problem. It's 12:34going to be difficult. So that's the 12:36fundamental tension we're all going to 12:38be negotiating over the next 12 to 24 12:40months. I want to get to a second piece 12:43that the report called out that I think 12:44is really important. If that is our 12:46overall strategic canvas, think of this 12:48as a set of questions we need to be 12:52evaluating within this space. First, I 12:54want to talk about evaluation of 12:56reasoning gains. One of the things that 12:58we need to get more deliberate about and 13:00that I think we're starting to see more 13:02testing done on recently, but we didn't 13:05in the first half of the year is how we 13:07measure success, how we measure 13:09intelligence, how we measure capability 13:11of LLM gains. And so, you guys know I've 13:14talked about the story of Claude 13:16disastrously running a vending machine. 13:18It was a complete disaster. And the 13:21point was whatever the advertised 13:22intelligence was, Claude wasn't doing 13:24real economic work. More recently, Open 13:27AI has launched GDP val where they're 13:29trying to test within a constrained 13:31environment how AI solves economically 13:34useful problems. Well, one of the larger 13:36pieces here, one of the larger reasons 13:38we need these kinds of evaluations is 13:41that reasoning gains are more fragile 13:43than they're often advertised by the 13:45model makers. Anyone who's built 13:47production LLM will understand this. You 13:49have the frontline headline reasoning 13:52gains and then you have a discounted 13:56value that you can actually use. The 13:58most recent example of this is Claude 14:00claiming uh or anthropic claiming that 14:02Claude could do 30 hours of work and 14:05rebuild Slack, which may well have been 14:07true. I don't see any reason why it 14:08wouldn't be. But when they tested the 14:11same model in controlled conditions on 14:14the MER metric, it did not deliver 30 14:17hours. It got close to 2 hours. That's a 14:20big discount. And we're seeing that kind 14:23of discount across a lot of different 14:25areas in AI right now. And I don't want 14:28you to hear that we're not making 14:29progress. But I want you to hear that we 14:32need to take reasoning gains, 14:34intelligence gains as somewhat more 14:36fragile than they are advertised on the 14:38top line. And we need to think more 14:41carefully 14:42about how we can build more sustainably 14:46with these systems. Given that the 14:49topline gains don't always pan out in 14:51the way we expect, we have to be more 14:53intentional. And I think that some of 14:55these issues are going to get more 14:57challenging as models do indeed scale in 15:00intelligence because regardless of 15:02claims of topline inflation, we continue 15:04to see gains in intelligence overall and 15:06we need to factor that. So, as an 15:08example of something that gets more 15:10difficult as we gain in intelligence 15:12with models, models can fake alignment, 15:14right? They can detect that they're 15:16being evaluated. they can adjust their 15:17reasoning chains to appear more aligned 15:19and that's something that model makers 15:20are actively working to address. It is 15:22something that gets worse as models get 15:26better and that can be one of the 15:29factors that discounts some of the value 15:31of the model frontier updates. 15:32Sycophency is on the rise when humans 15:36give feedback and one of the core 15:38principles of AI is that reinforcement 15:41learning with humans is helpful. But 15:43what happens when the model gets smart 15:45enough to recognize that it's the human 15:47giving feedback and it tries to please 15:49the human rather than trying to do the 15:50task well? What happens when models 15:53start to recognize that they're being 15:55tested and change their behavior when 15:58they're being tested? We're already 15:59seeing evidence of that. And so I'm 16:01giving you those three examples because 16:03those illustrate that we can have real 16:07model intelligence gains, but that 16:09factors tied up in the way we train and 16:11build our models can undercut those 16:14gains to some extent and make it more 16:16difficult to make progress that we feel 16:19day-to-day. In a sense, if you look at 16:21the larger picture, it's kind of amazing 16:22that we've made the progress we have 16:24already. I anticipate continued progress 16:26in model intelligence particularly in 16:28vertical specific applications. But we 16:31are going to need to recognize that 16:33these topline gains have to be 16:35discounted against some of the 16:37challenges that come with bigger models, 16:39smarter models, models that think more. 16:41They have different kinds of challenges 16:43that we're working through. So that is 16:45one of the things that makes it 16:46interesting. We have this scaling demand 16:47for tokens. Tokens are getting cheaper. 16:49We have the power constraint piece. 16:51Well, now we have to think about as we 16:53scale demand for tokens, which model is 16:55the right one to get to? What does a 16:57frontier model even mean? If it is a 16:59frontier model, how do we measure that? 17:02Those are all going to become hyper 17:03relevant questions in 2026 and I expect 17:06more investment in measures like GDP val 17:09from open AI because it gives us a sense 17:11of how models actually do real 17:13economically useful work. Another big 17:15theme is the question around model 17:18frontier leadership in closed models 17:20versus open weights. China has become 17:23the dominant player in the openweight 17:25ecosystem. And so even though GPT5 from 17:28open AI, Gemini 2.5 from Google and 17:32Claude and the Sonnet 4.5 family 17:34continue to lead on leaderboards in raw 17:37capability. The frontier remains closely 17:41followed by Chinese labs, particularly 17:44the Quen lab from Alibaba and Deepseek. 17:47And this has been very deliberate. China 17:49is pursuing an openweight strategy 17:51because it gives them distribution 17:53leverage. They can get anywhere on 17:55premises, sovereign clouds, consumer 17:57hardware, it doesn't matter. And so they 17:59have adoption pathways that don't work 18:01with US cloud providers. It also allows 18:04organizations to customize and 18:05fine-tune. And finally, it allows China 18:08to retain 18:1077,000 or more STEM PhDs who are 18:13starting to concentrate on AI talent 18:16onshore. So there's a talent onoring 18:18that's happening around these 18:20open-source models that enables China to 18:23build an open- source approach that's an 18:25ecosystem, not just one single model. 18:27Now Quinn has become the dominant 18:29openweight choice across multiple 18:30international markets, but it's not the 18:34only one. And I would anticipate that 18:36the Open Weights ecosystem as a whole is 18:39going to continue to shift forward. Now, 18:40there is one significant update here. 18:43More recently, OpenAI's GPT OSS release, 18:47a partially open source stack. It drove 18:49home the fact that Frontier models in 18:52the US in Silicon Valley may not just 18:56yield ground on open weights models. 18:58they may choose to release open weights 19:00models that keep their model lineage 19:03competitive with models like Quentyn. 19:06And so open doesn't necessarily have to 19:08mean frontier competitive in a world 19:11where you have this in incredible cost 19:14to capability curve. You can have 19:17frontier competitive or frontier 19:19adjacent models that are super 19:22economically useful and that are 19:25effectively right. You can have the 19:27compute that you can run yourself. You 19:29can have the model weights. You can do 19:30whatever you want. And so I think that 19:32the opportunity we see here is to think 19:36about open versus closed as less than 19:39binary. So frontier capability so far 19:42remains closed in US-led but open 19:45weights have a lot of range. There's a 19:47range of open weight standard. Some of 19:50them are fully open, some of them are 19:51partially open. and they enable, 19:54especially as we get these increased 19:55cost to capability thresholds, 19:57distribution, customization, and 19:59sovereignty opportunities that closed 20:01cloud opportunities can't match. Like if 20:03you have OpenAI's terms of service from 20:06a cloud provider, that's what you got. 20:08That's not true with open- source. 20:10Enterprises increasingly are going to 20:12plan on hybrid architectures. They'll 20:14have closed frontier models where they 20:16really have to have frontier 20:17intelligence for highstake reasoning, 20:19but they may well go to open models for 20:23volume tasks to handle regulatory 20:25compliance or whatever whatever else 20:26they may need. I want to call out 20:28something that I mentioned at the top. I 20:30talked about the importance of routing 20:32in a world where cost to capability is 20:34becoming a dominant force. Let's open 20:36that up a little bit and explain why 20:38routing wins. As capability to cost 20:40improves exponentially, 20:42you get to a world where GPT5's 20:46router UX becomes default. So, everyone 20:50complained from a consumer side about 20:52the fact that GPT5 routes you to 20:55different models. Well, on the back end, 20:57if you're designing systems, that's 20:58actually desirable. The interface 21:00dynamically selects speed optimized or 21:03capability optimized variance depending 21:04on the task detection. That reduces cost 21:07per query. It improves latency. It 21:09maintains quality. Ideally, people 21:11complained about it, but you know, you 21:13improve it. Routing is a core UX and 21:15business lever. Now, it's not just for 21:17back-end optimization. Products that 21:19expose routing choice to users or 21:21products that offer it invisibly are 21:23able to offer better pricing and 21:25potentially faster responses at elevated 21:27quality. And that can create 21:29differentiation in what would otherwise 21:31be a very commoditized space. And so you 21:34need to think about your architectural 21:35decisions in cases where context is 21:37expensive. Long context first designs 21:40will simplify systems. They'll you know 21:42reduce latency but they concentrate risk 21:45on a single provider. So you need to 21:47think about installing model routing as 21:49a first class object. Another key theme 21:51I want to call out is sovereign AI. So 21:53the sovereign AI movement accelerated in 21:562025. And one of the things that you'll 21:58notice is that sovereign AI pathways are 22:02not as sovereign as you think. So a lot 22:06of these sovereign announcements remain 22:08reliant on US hyperscalers for cloud 22:11infrastructures. They will import 22:13foreign models via API. They still 22:15depend on NVIDIA hardware. And so most 22:18of the sovereign mega deals that are 22:20announced actually create a 22:22self-reinforcing loop that continues to 22:25concentrate capital on the core model 22:28makers, Nvidia, and perhaps core cloud 22:31providers like Azure. So you need to 22:33understand if you're seeing sovereign 22:36announcements that this may be about 22:38what we talked about earlier in this 22:39video. It may well be about data center 22:41sighting and the availability of power 22:44supply more than it's about a truly 22:46sovereign and independent AI. So what 22:49are some of the implications that we see 22:50here? If you're in a building or 22:52founding space, you need to be assuming 22:55a world next year where intelligence per 22:58dollar continues to double every four or 23:00five months. And therefore your margin 23:03opportunity is smarter routing. And so 23:06you need to think about a core product 23:09lever that enables you to route more 23:12intelligent that enables you to deliver 23:14higher quality at lesser cost with the 23:17assumption that models will continue to 23:19deliver more capability for cheaper 23:21going forward. You also need to think 23:22about how you capture distribution with 23:25answer engine optimization. You should 23:27assume that there the 60% AI search 23:30share from Chad GPT, the 11% retail 23:33conversion rate, those are sticking 23:34around. You should assume that there's 23:36going to be an ad network launched 23:38against that. You should assume that if 23:40you are not in place with AEO optimized 23:44content like structured data, canonical 23:46APIs, etc. You are invisible to the 23:48fastest growing e-commerce distribution 23:50channel. You should also have a keen eye 23:53on the relevant infrastructure risks in 23:56your space. And so that means, as funny 23:58as it sounds, keeping an eye on how 24:00Stargate is doing and how other major 24:02data center projects from Microsoft are 24:04doing because any delays there driven by 24:07power constraints, driven by nimism, 24:09etc. end up flowing through into real 24:12token availability for businesses. And 24:15that could become relevant in 2026 given 24:17the exploding pace of token. If you're 24:19on the investment side, you're going to 24:22want to be reading the news of tomorrow 24:24and understanding the investment picture 24:26through the assumption that companies 24:28win on routing intelligence through the 24:31assumption that you have real dependence 24:35on the core model makers in Nvidia that 24:37isn't going anywhere. Those circular 24:38flows are real. through the idea that 24:40demand is scaling faster than supply 24:43across the board and that in that world 24:46you need to understand who has 24:48infrastructure access and who doesn't. 24:50And finally, you need to think about 24:52distribution. Who has distribution in 24:55the middle of a world of performance 24:56bottlenecks? Chad GPT certainly does. 24:59Who else does? And who is able to 25:02maintain and grow that distribution in 25:03this complex world? Finally, if you're 25:06an AI enthusiast, you should take away 25:08from this that this is the beginning of 25:12the next step in the AI revolution. If 25:15the first step was around the models 25:18just getting smarter and smarter and 25:19smarter and we're all going after those 25:21smarter models every single time. This 25:23wave is about how we can move from just 25:28playing on pure intelligence just 25:30celebrating the fact that we can now do 25:32Excel and PowerPoint to a world where we 25:34need to have differentiated skills 25:36around particular workflows and systems. 25:38So just as you can talk about router 25:40intelligence and the cost per capability 25:43curve for systems, you can think about 25:45that for your own skill set. How can you 25:47route workflows more efficiently now 25:50that you understand this situation 25:52better? Is Chad GPT5 always the right 25:55choice or do you go with another model? 25:56How can you think about that more 25:58deliberately? Is your particular model 26:01choice going to be a model choice where 26:03you have expanded availability over time 26:06or do hard infrastructure constraints 26:08make that more difficult? Enthropic is 26:10actually a good example of a model maker 26:12that has remained infrastructure hard 26:15limited for most of 2025 and that is one 26:18of the reasons why they've been unable 26:20to roll out things like rolling context 26:22windows. It's been one of the reasons 26:24why they've had some persistent issues 26:26the last few months with outages etc. 26:28and rumored to be one of the reasons why 26:30they've struggled with releasing some of 26:32their newer models. The point here is 26:34not don't choose Anthropic. I love 26:36Anthropic. It's a fantastic model. The 26:38point is that constraints already are 26:40impacting real world availability, not 26:43just for Anthropic. Chad GPT has issues 26:45at times as well and they've been honest 26:47about that particularly post launch. Be 26:49aware of the constraints that shape 26:51intelligence availability in your space 26:53and be smart about what you choose to 26:56build what you choose to do with that. 26:58This is why major tool makers like 27:01cursor and lovable think about having a 27:03multimodel architecture underneath. It 27:05gives them that option to pick a 27:07different model. The last thing I will 27:09call out is that none of this is 27:11theoretical. We are living in a world 27:14where the cost of intelligence really is 27:16going to zero. And it gives individuals 27:20an immense amount of agency to choose 27:23your own adventure. Your ability to form 27:25intent and go after something with 27:28clarity, focus, and dedication has never 27:31had more leverage because the 27:33intelligence is going to get better and 27:35better and cheaper and cheaper. We are 27:37in a world where you can teach yourself 27:39any skill you want with the help of AI 27:41in just a few months. And so it's going 27:43to be on individuals to go after what 27:45they want and the ability is not going 27:48to be evenly distributed. What I'm 27:50finding is that even though everyone has 27:53access to these models, very few folks 27:56are making the most of them. And so the 27:58willingness to jump on and make the most 28:00of not just frontier models but 28:03potentially cheaper next generation 28:04models that are adjacent to the 28:06frontier. The willingness to understand 28:08these strategic constraints and think 28:10about the opportunities that you have. 28:12That is rare. That is rare. And so if 28:14you watch this video, thanks for tagging 28:16along. You were probably one of the few 28:18that is paying attention to the 28:19strategic themes in AI right now. Best 28:22of luck in 2026. It's going to be a wild 28:26wild ride as I hope this video is made