Learning Library

← Back to Library

Infrastructure First, Tools Later

Key Points

  • Coding assistants act like a “rocket engine” for development, so they magnify both the strengths and weaknesses of a team’s existing engineering infrastructure.
  • Adding a new tool (e.g., Codeex) to a weak or poorly defined workflow will likely produce a net negative impact despite the tool’s hype.
  • The critical decisions lie in the engineering‑infrastructure layer; only after solid foundations are in place should you evaluate specific coding‑assistant tools.
  • Technical leaders must first articulate a precise problem or goal (e.g., speeding boilerplate, onboarding juniors, reducing bugs) rather than a vague “boost productivity” ambition.
  • Larger organizations often struggle to define such concrete objectives, making indiscriminate tool adoption especially risky for them.

Full Transcript

# Infrastructure First, Tools Later **Source:** [https://www.youtube.com/watch?v=cVZCfpkHNBg](https://www.youtube.com/watch?v=cVZCfpkHNBg) **Duration:** 00:19:00 ## Summary - Coding assistants act like a “rocket engine” for development, so they magnify both the strengths and weaknesses of a team’s existing engineering infrastructure. - Adding a new tool (e.g., Codeex) to a weak or poorly defined workflow will likely produce a net negative impact despite the tool’s hype. - The critical decisions lie in the engineering‑infrastructure layer; only after solid foundations are in place should you evaluate specific coding‑assistant tools. - Technical leaders must first articulate a precise problem or goal (e.g., speeding boilerplate, onboarding juniors, reducing bugs) rather than a vague “boost productivity” ambition. - Larger organizations often struggle to define such concrete objectives, making indiscriminate tool adoption especially risky for them. ## Sections - [00:00:00](https://www.youtube.com/watch?v=cVZCfpkHNBg&t=0s) **Tool Hype vs Infrastructure Basics** - The speaker cautions that while coding assistants can speed development, they become detrimental when underlying engineering practices are weak, urging teams to prioritize solid infrastructure decisions before chasing popular tools. - [00:03:20](https://www.youtube.com/watch?v=cVZCfpkHNBg&t=200s) **Ensuring Foundations Before AI Adoption** - The speaker stresses that without solid review processes, design documentation, and tooling aligned to the team’s workflow, AI assistants will degrade rather than improve software development, especially as teams grow. - [00:07:39](https://www.youtube.com/watch?v=cVZCfpkHNBg&t=459s) **Beware LLM Code Drift** - Relying on AI‑generated code without continual, multi‑person review leads to hidden architectural decay, wasted managerial time, and ineffective productivity metrics. - [00:11:22](https://www.youtube.com/watch?v=cVZCfpkHNBg&t=682s) **Budgeting Junior AI Learning Rollout** - The speaker outlines how to plan, fund, and pilot AI‑assisted coding tools for junior developers—emphasizing hands‑on understanding, hidden costs beyond licensing, and iterative testing with small “two‑pizza” teams before scaling. - [00:14:58](https://www.youtube.com/watch?v=cVZCfpkHNBg&t=898s) **Evaluating Tool Limits & Team Practices** - The speaker discusses how tool and model constraints affect coding work, the need for adaptable setups, and how team habits—such as code review timing, repeat mistakes, and feedback quality—impact overall engineering effectiveness. ## Full Transcript
0:00I have a very simple thesis which may 0:02not be popular but is nonetheless true. 0:05Coding assistants accelerate your 0:08development practices whether they are 0:10good or bad. In other words, you are 0:13tying a giant rocket engine to whatever 0:16engineering infrastructure practices you 0:19have and you're saying go just go faster 0:22go do more. You know what? If you have 0:25any kind of weakness in your engineering 0:27infrastructure layer, your best 0:28practices layer, that choice to add 0:31clawed code, to add codeex, which just 0:34updated this week, that's going to end 0:36up being net negative. Yeah, I said it. 0:38It's going to end up being net negative. 0:40I don't want that for you because there 0:42are teams that are getting real gains. 0:44There was a viral post recently on 0:46Reddit called this is how we vibe coded 0:49a fang. You know what it was about? It 0:51wasn't about a vibe coding tool set that 0:53would magically fix everything. It was 0:55about the engineering infrastructure 0:57decisions that matter. And I want to 0:59focus on that today because you know we 1:02could take this time and we could dive 1:03into why codeex is the best thing since 1:05sliced bread because at the top of the 1:08news this week and that's all anyone can 1:11talk about if they're in development is 1:12like do we use codeex? Do we use cloud 1:14code? You are asking the wrong question. 1:18In most cases, the right question is at 1:20the engineering infrastructure layer. 1:22And you only get to the tool choice if 1:24you've asked the right engineering 1:26infrastructure questions. So I want to 1:28give you in this conversation the 1:30specific questions you should be asking 1:33yourself as a technical leader, as a 1:36technical team member, as a builder, as 1:38a coder, as a vibe coder. Before you 1:40pick a tool, ask yourself these first 1:43because then when you use the tool, 1:45you'll be able to go actually faster and 1:47not slower. Question number one, what is 1:50the problem that we are solving 1:52specifically? Almost no one can answer 1:54this actually. Just try answering it. Is 1:56it speeding up boilerplate code? Is it 1:59onboarding juniors? Is it reducing bugs 2:02and repetitive tasks or something else? 2:05If you have a vague goal like we're 2:07going to boost the productivity of our 2:08engineering team, I'm sorry, you've been 2:10sitting in the seauite too long. Like I 2:12need some specifics here. I need you to 2:14say specifically this is the expectation 2:18that we have for what this tool will do 2:20for our engineers and why. Or if I'm a 2:23builder individually, this is what it 2:25will do for me and why. Maybe it's as 2:27simple as, you know, I'm a builder and 2:30using Devon or using Claude code, I'm 2:32going to get time back. I can be in a 2:33meeting and the thing can be building 2:34anyway. Okay, that's fair. That's a 2:37specific goal. You can talk about 2:38optimizing for that goal and what the 2:40tools and all of that, but if you don't 2:41have specific problems you're trying to 2:43solve, specific goals that you're 2:45setting, you are already off in the 2:47wrong direction. And I find that the 2:50bigger the company, the harder this is 2:52to do. Larger companies with larger 2:54teams often have real trouble saying 2:57what is the specific problem that 2:58they're driving at, and it takes a lot 2:59of work to peel the onion and get there. 3:01But you need to question two. Do we have 3:04strong engineering practices already 3:07that are worth amplifying? Look at the 3:10prerex. Do you have consistent code 3:13patterns across your codebase? Do you 3:15have date documents that are up to date? 3:17Do you have actual review culture and 3:20rigorous PR reviews? Do you have design 3:22docs that you're proud of and you can 3:24stand behind? If you don't, it is likely 3:27that whatever agent you pick, whatever 3:29tool you pick, AI is going to make 3:31whatever you're doing worse. You need to 3:34take the time to try to get your house 3:38in order so that what you select has a 3:40foundation to build on. AI is 3:43surprisingly fragile in that regard. 3:45It's amazing at so many things, but it 3:47does need you to be discip. It needs you 3:50to have good engineering practices for 3:52it to ladder in as infrastructure in a 3:54supportive way. And so many houses 3:57don't. And again, this becomes something 3:59that is big company challenging. If 4:01you're a small coder on your own, you 4:03can say, "Yeah, I keep all of my coding 4:04decisions in this markdown file and I 4:06and I have Cloud Code go and check it 4:08and we're done." Or, you know, I review 4:10all the poll requests myself. I know 4:12that I do a good job. The bigger your 4:14team is, the more complex this is and 4:16the more you have to actually think 4:17about this. Complexity scales 4:19nonlinearly and that makes tool 4:21assessment much more complex once you 4:23get past even just a few developers into 4:26the team or multi- teamam scale. Number 4:28three, does the tool align with the 4:31workflow and the tech stack? This is 4:33complex but you have to ask yourself 4:36what is the team already using? Are they 4:37using cursor? Are they using VS Code? 4:39Whatever it is, what is the code host? 4:41Are we on GitHub? Are we in terminals in 4:44some cases? And you have to think about 4:46what real workflow compatibility looks 4:49like. And I'm going to give you an extra 4:50challenge here. You need to think about 4:53workflow compatibility outside the 4:56engineering team, which circles back to 4:58my second question around engineering 5:00practices. Assume you are living in a 5:02world especially if you are 5:04subenterprise level where people who are 5:07not traditional engineers will have code 5:09related ideas and potentially code 5:12related prototypes they want to push 5:14into the code stream in some fashion. 5:16Maybe not to production. Maybe an 5:17engineer has to review it. But there are 5:20companies who are above single founder 5:22level with teams where non-coders are 5:26submitting poll requests thanks to their 5:29use of a coding agent. Do you have 5:31strong enough engineering practices to 5:33sustain in that world? Do you have tools 5:36that enable people who would not 5:38normally have production commit 5:39permissions to still be able to do some 5:42degree of coding work and pass it to an 5:44engineering architect? As far as I know, 5:47there is no true plugandplay in that 5:49world. You have to look at your unique 5:52fingerprint and you have to decide what 5:54is the tool stack that is going to be 5:56compatible. I think one of the things I 5:58want to call out here that was notable 6:00to me as I was reviewing codeex and 6:02claude code is that codeex seems to 6:04implicitly presume a center of gravity 6:08around a larger team. So much of codeex 6:10is around can I automatically review the 6:13PRs that are getting submitted for my 6:15code right can codeex is already there 6:17it can go in it can look in GitHub it 6:19can review the PRs it can write up 6:20reviews etc it can even go and fix and 6:22address issues whereas clawed code is 6:25more predicated around the idea that you 6:27were working in the terminal and you 6:29were building end to end and you may be 6:30fixing issues and you may be working on 6:32things besides code it's not that one is 6:34good and one is bad it's that their 6:37focus is different in the ecosystem And 6:39you have to think about where the 6:40leverage lies because it's absolutely 6:42true that if you wanted claude code to 6:44review your PRs, you can do it. People 6:46have have done it all the time. Single 6:48builders similarly use codecs all the 6:50time. I know some that swear by it. And 6:52so it's not that one tool is perfect for 6:54any use case. It's that you have to 6:55think about what works for you. Not just 6:58from a model power perspective or from a 7:00congruence to prompt perspective or from 7:02a degree of comfort with the model or 7:04even from a token burn perspective. You 7:06have to think about it from an ecosystem 7:07perspective. How does it fit? Number 7:09four, do you know how you're going to 7:10measure success? Do you know how you're 7:12going to track changes that happen in 7:14the codebase? What metrics matter to 7:16you? Do you have metrics that are sort 7:18of vanity metrics where you're like, "Oh 7:20yeah, we're going to have so many 7:21commits and that's going to be the way 7:22we do it." Or it's lines of code. We're 7:24going to brag to the CTO about the 7:26number of lines of code that are AI 7:28written and the CTO is going to write 7:29this up into a summary and the CEO is 7:31going to tweet it out, which by the way, 7:33that totally happens. Is that really a 7:35metric or is that a vanity metric? 7:36Right? just just having lines of code is 7:39something that any engineer will tell 7:40you is a terrible metric for actual 7:42productivity. So think about how you 7:44want to measure value. One of the horror 7:47stories, and I don't say this to scare 7:49you, but I say this to warn you, it is 7:51certainly possible to think you made 7:53these decisions well, but to not really 7:55factor in the ongoing impact of what I 7:59will call LLM croft over time. And so 8:02what I mean by that is the LLM is pretty 8:04good. The LLM understands your codebase. 8:06You think your engineering 8:07infrastructure is up to the challenge, 8:09but you don't have ongoing rhythms that 8:11have the whole team checking and 8:13reviewing LLM coding so that everybody 8:15knows what's going on. Everybody is 8:17conforming to best practices. The LLM 8:19isn't drifting on its own. And what you 8:21end up finding is that over time you 8:23spend more and more and more and more of 8:24the engineering manager's time or the 8:26founders's time reviewing what the LLM 8:29submitted. and they get less and less 8:31time for leadership for strategic 8:33thinking because at the end of the day 8:34the codebase is more and more and more 8:36difficult to understand because the LLM 8:39has made effectively unintentional 8:42architectural decisions that someone 8:43else has to disentangle. And so my my 8:46advice for you is more eyes are better 8:48than not. Right? If you are in a 8:49position where you have multiple eyes 8:51and you're building with multiple 8:52people, put those eyes and have 8:54everybody's expectation be that AI code 8:56doesn't go to prod unless someone looks 8:58at it and can say, yes, this is 9:00architecturally correct. Yes, this 9:02actually works. That's not always the 9:03case. There are lots of people who say, 9:05you know what, we don't do that. We 9:07believe it works. That's fine. And maybe 9:10in a few cases, you are so buttoned up 9:12and you are so clean and everything is 9:13so well documented and it's so perfect 9:15on your small team, you can get away 9:17with that. But I'm not here for those 9:19perfect 1enters. I'm here for everybody 9:21else who lives in the reality of partial 9:23documentation and everybody doing their 9:25best and everybody trying to meet their 9:26deadlines and everybody trying to code 9:28according to the new best practices and 9:30sometimes forgetting. Okay, fine. You 9:32should be in a place where you can 9:34actually institute engineering practices 9:38that sustain the benefits of LLMs by 9:41having regular reviews of the codebase 9:43and regular reviews of LLM performance. 9:45That's what I mean by can you measure 9:47success? Can you actually track changes 9:49over time? Number five is security and 9:52data privacy thought through carefully 9:54here. Do you feel comfortable with the 9:56terms of service your vendor is 9:57offering, the model maker is offering? 9:59Are you okay have you checked for IP 10:01leaks, vulnerabilities and generated 10:03code, appliance issues, liability 10:05generated by that code if it has a 10:07mistake in it? you will need a much 10:11higher bar on both QA and production 10:14code to successfully have agents in 10:17play. So yes, they can write code much 10:19faster. There are security researchers 10:22who will tell me that's just a way of 10:24manufacturing vulnerabilities much 10:25faster, right? And yes, some of them 10:27will also catch vulnerabilities. And 10:29that is actually one of the things that 10:31OpenAI called out about Codeex is that 10:33it's good at catching vulnerabilities in 10:35code and that OpenAI themselves use 10:38Codeex as part of their QA process 10:42before going to production. So I'm not 10:44here to tell you that Codeex and Cloud 10:45Code don't add value. These two 10:47companies are dog fooding their own 10:49product and they are finding ways to get 10:51value out of it. But I am here to point 10:54out that they're not silver bullets and 10:55that if we want to have a deep dive on 10:57codecs, we got to talk about some of 10:59these engineering infrastructure 11:00practices first. Number six, do you 11:03actually have buyin? Again, bigger 11:05companies, nonlinear problem spaces, 11:07this is going to be harder. If you have 11:08junior engineers and senior engineers 11:10and principles, and maybe you have some 11:12some non-technical people like I talked 11:14about, how are you planning for 11:15education on prompting? How are you 11:17planning for reviewing your AI outputs? 11:19How are you planning for understanding 11:22what learning use looks like for 11:24juniors? So juniors understand how code 11:27actually works and how system components 11:29go together and they don't end up 11:31overdeferring to AI. How do you budget 11:33for the resources, the money, also the 11:36time to actually learn this and not just 11:38get into the temptation of set it and 11:40forget it because these tools are 11:42temptingly easy to set and forget. You 11:44can just tell them to do things and 11:46maybe the cost doesn't come due today, 11:48right? Maybe the bill comes due in 6 11:50months. You have to be disciplined to do 11:52it today. Number seven, what is the 11:55total cost beyond pricing? So you have 11:58to look at setup, you have to look at 12:00maintenance, you have to look at context 12:02engineering costs, you have to look at 12:04fixes for bad outputs. It is worth it if 12:07you have a big team to do a pilot for 12:09this because you can actually see over 12:11two or three months for this individual 12:13two pizza team, for this small team, 12:15what did the value look like? And that 12:16is exactly the pattern we see in a lot 12:18of enterprises is that they will roll 12:20this out for a small group, test it, 12:22gather learnings, and then figure out 12:24how that larger pathway will go. Again, 12:27if you're a small team, it's super easy 12:28to turn around. It's a two-way door. You 12:30try codeex today, you say, "Oh, it feels 12:32better." You dump cloud code. You try 12:34cloud code tomorrow, when they release 12:35something new, you say, "Oh, it feels 12:36better." You dump codeex. It is not as 12:38easy when you're on a bigger team. It 12:40doesn't work that way. Okay, we've 12:42talked about some of the foundational 12:43questions to ask when you are getting 12:45set up. I also am aware looking at the 12:48poll requests, looking at the Reddits, 12:50so many of you already use an AI coding 12:54assistant. And so the second part of 12:55this is really going to be asking what 12:58are the questions you need to address as 13:00a current user of a coding assistant to 13:03figure out if AI is actually helping you 13:07or hurting you and how you can 13:08troubleshoot that and make the most of 13:10your current AI coding assistant 13:12implementation. And just like the first 13:14seven, we're going to go through seven. 13:16And you're going to start to see a 13:17mapping there. I'm I'm deliberately 13:19creating a doubling effect here so that 13:21you can see how this maps from 13:22pre-implementation into implementation. 13:24Number one, is the AI amplifying 13:27inconsistencies in the codebase. This 13:28maps right back to the idea of having a 13:31consistent infer layer, doesn't it? You 13:33need to check and see if there are 13:35antiatterns in the suggestions that are 13:37persistent. You need to audit and say if 13:39you have outputs that go wrong, do they 13:41skew? Do they go wrong in a particular 13:43direction? Do you need to fine-tune your 13:45document standards in a particular way 13:46so that the anti patterns disappear? 13:48That's on you. You need to check that. 13:50Number two, are you reviewing and 13:52testing AI output? I talked about that 13:54as something you need to be ready to do. 13:56But are you actually doing are you 13:58actually skipping the explanations? Are 14:00you skipping the edge case tests that 14:02it's recommending? Are you just saying 14:03explain yourself and you're saying, 14:05"Well, that's documentation and that's 14:06good enough." Do you feel like you can 14:09own the AI output from your AI coding 14:11assistant? That's really the standard. 14:13If you can stand behind it and say this 14:15code is mine, okay, fair enough, but not 14:18everybody does that. Number three, is 14:20prompting or context an issue for you as 14:23you start to drive coding assistance 14:24forward. If you have vague prompts and 14:26you're getting vague code, does your 14:28team have clear specs? Does your team 14:30have design docs? Does your team have 14:32examples? You see how this goes right 14:33back to the info layer? You can actually 14:36diagnose this by testing small 14:39incremental changes against small 14:41incremental changes in your codebase, 14:43right? You can change the documentation, 14:44you can change the prompt, and you can 14:46see if the codebase gets better, and you 14:48can start to figure out where your test 14:49cases are that you need to fix for your 14:52infrastructure layer. So, this is 14:53actually something where you can 14:54pinpoint a fix if you're deliberate. 14:56Number four, are errors due to tool 14:58limitations that you have? Is your tool 15:00infrastructure actually thought through? 15:02Do you have model weaknesses? One of the 15:04things that Codex emphasized is that 15:06they understand the nonlinearity of 15:09coding problems that some coding 15:10problems need very token efficient 15:12surgical changes and some coding 15:14problems need very agentic long- form 15:16changes and they produce some metrics to 15:18say they're better at it. You know, your 15:19mileage may vary. You'll have to see if 15:21you agree with them. But the point is, 15:22does the tool match the setup? Do you 15:24have a setup that allows you to switch 15:27models if you need to? Do you have a 15:28setup that allows you to understand the 15:30size of codebase you actually have or 15:32particular niche domain or particular 15:34niche language that you actually have? 15:35An example of that is uh cloud code and 15:38cobalt. Try it out sometime if you're a 15:40cobalt person. See see what you think. 15:42Number five, how is team usage? Are your 15:44teams getting better at engineering? Are 15:47your non-engineeries learning 15:49engineering practices? How often do you 15:52catch each other in changes made before 15:56production so that you actually didn't 15:57break something versus how often do you 16:00catch things after production and you 16:02wonder what happened? How often do you 16:04have common newbie mistakes and do they 16:06keep getting repeated? How often are 16:08people copy pasting without 16:10understanding? How often are people not 16:12really giving thoughtful feedback to the 16:14tool? There's a team culture thing here 16:16that it's really up to leadership to 16:18reinforce. Number six, are you measuring 16:20what matters and can you track it and 16:22show that you're actually delivering 16:23value? I I am here to suggest that there 16:26are two key pieces to this. One is tying 16:29what engineering is doing to real 16:31business use cases that matter, business 16:33projects that matter, revenue, cost 16:35efficiencies. I know engineers get 16:37nervous about that, but you have to have 16:39stakes in the game. The other is making 16:40sure that your leading edge indicators 16:42are solid. Do you understand what LLM 16:45latency looks like? If you have 16:46something that's in production, do you 16:48understand how that you are testing for 16:50edge cases and how those edge cases 16:53actually manifest in production from an 16:55LLM? Do you understand how to show that 16:58your documentation is clean enough and 17:01to run evals on your code performance? 17:03So you can say, yeah, the the code 17:06prompt to code quality is very high. We 17:09have human evaluators that say that and 17:11we also have some automated evals that 17:13will actually say, you know, the number 17:14of PR comments is going down, the 17:17quality of PRs is going up, the number 17:19of bugs that we've seen in production is 17:21going down, like you have some concrete 17:22things you can point to. And number 17:24seven, if failures persist, is the issue 17:27that your preparation was inadequate, 17:28which means going back to the beginning 17:30of this video and dump jumping right in, 17:32or do you have something fundamental in 17:34your stack? Maybe you need to think 17:36about how you implement the codebased 17:39context. Maybe you have to have an 17:41agentic search approach where you're 17:43searching through the context. Maybe the 17:45rag is not the right approach for you. 17:48Whatever it is, think about a 17:50disciplined audit before you assume you 17:53blame AI. Undisiplined teams blame AI 17:56because it's cheaper and easier. 17:58Disciplined teams root cause specific 18:00problems and actually get value. These 18:02are the things that I have to say over 18:05and over again when people want to rush 18:07in and say, "The Codeex news dropped, 18:10Nate. The Codeex news dropped. What do I 18:11do with Codeex?" Well, this is what I 18:14say. Have you had these infrastructure 18:16conversations first? Please have these 18:18infrastructure conversations. They 18:20matter. They help you build what 18:21matters. These are the conversations 18:23that you have to put in place so that 18:25when AI amplifies all of the practices 18:27you actually do on a daily basis, not 18:29the ones you dream about. Well, now that 18:31that you have these practices in place, 18:33maybe it will amplify good stuff, not 18:36bad stuff. Maybe it will actually help 18:38you go faster, not slower. Maybe it will 18:40help you deliver more quality code, not 18:42break things. You get the idea. The 18:44infrastructure matters. Take these 18:46questions seriously. Take them as the 18:49initial gate that you have to get 18:51through before it makes sense to have 18:53complicated conversations about which 18:55tool to choose. We will do a deep dive 18:57on codecs another