Infrastructure First, Tools Later
Key Points
- Coding assistants act like a “rocket engine” for development, so they magnify both the strengths and weaknesses of a team’s existing engineering infrastructure.
- Adding a new tool (e.g., Codeex) to a weak or poorly defined workflow will likely produce a net negative impact despite the tool’s hype.
- The critical decisions lie in the engineering‑infrastructure layer; only after solid foundations are in place should you evaluate specific coding‑assistant tools.
- Technical leaders must first articulate a precise problem or goal (e.g., speeding boilerplate, onboarding juniors, reducing bugs) rather than a vague “boost productivity” ambition.
- Larger organizations often struggle to define such concrete objectives, making indiscriminate tool adoption especially risky for them.
Sections
- Tool Hype vs Infrastructure Basics - The speaker cautions that while coding assistants can speed development, they become detrimental when underlying engineering practices are weak, urging teams to prioritize solid infrastructure decisions before chasing popular tools.
- Ensuring Foundations Before AI Adoption - The speaker stresses that without solid review processes, design documentation, and tooling aligned to the team’s workflow, AI assistants will degrade rather than improve software development, especially as teams grow.
- Beware LLM Code Drift - Relying on AI‑generated code without continual, multi‑person review leads to hidden architectural decay, wasted managerial time, and ineffective productivity metrics.
- Budgeting Junior AI Learning Rollout - The speaker outlines how to plan, fund, and pilot AI‑assisted coding tools for junior developers—emphasizing hands‑on understanding, hidden costs beyond licensing, and iterative testing with small “two‑pizza” teams before scaling.
- Evaluating Tool Limits & Team Practices - The speaker discusses how tool and model constraints affect coding work, the need for adaptable setups, and how team habits—such as code review timing, repeat mistakes, and feedback quality—impact overall engineering effectiveness.
Full Transcript
# Infrastructure First, Tools Later **Source:** [https://www.youtube.com/watch?v=cVZCfpkHNBg](https://www.youtube.com/watch?v=cVZCfpkHNBg) **Duration:** 00:19:00 ## Summary - Coding assistants act like a “rocket engine” for development, so they magnify both the strengths and weaknesses of a team’s existing engineering infrastructure. - Adding a new tool (e.g., Codeex) to a weak or poorly defined workflow will likely produce a net negative impact despite the tool’s hype. - The critical decisions lie in the engineering‑infrastructure layer; only after solid foundations are in place should you evaluate specific coding‑assistant tools. - Technical leaders must first articulate a precise problem or goal (e.g., speeding boilerplate, onboarding juniors, reducing bugs) rather than a vague “boost productivity” ambition. - Larger organizations often struggle to define such concrete objectives, making indiscriminate tool adoption especially risky for them. ## Sections - [00:00:00](https://www.youtube.com/watch?v=cVZCfpkHNBg&t=0s) **Tool Hype vs Infrastructure Basics** - The speaker cautions that while coding assistants can speed development, they become detrimental when underlying engineering practices are weak, urging teams to prioritize solid infrastructure decisions before chasing popular tools. - [00:03:20](https://www.youtube.com/watch?v=cVZCfpkHNBg&t=200s) **Ensuring Foundations Before AI Adoption** - The speaker stresses that without solid review processes, design documentation, and tooling aligned to the team’s workflow, AI assistants will degrade rather than improve software development, especially as teams grow. - [00:07:39](https://www.youtube.com/watch?v=cVZCfpkHNBg&t=459s) **Beware LLM Code Drift** - Relying on AI‑generated code without continual, multi‑person review leads to hidden architectural decay, wasted managerial time, and ineffective productivity metrics. - [00:11:22](https://www.youtube.com/watch?v=cVZCfpkHNBg&t=682s) **Budgeting Junior AI Learning Rollout** - The speaker outlines how to plan, fund, and pilot AI‑assisted coding tools for junior developers—emphasizing hands‑on understanding, hidden costs beyond licensing, and iterative testing with small “two‑pizza” teams before scaling. - [00:14:58](https://www.youtube.com/watch?v=cVZCfpkHNBg&t=898s) **Evaluating Tool Limits & Team Practices** - The speaker discusses how tool and model constraints affect coding work, the need for adaptable setups, and how team habits—such as code review timing, repeat mistakes, and feedback quality—impact overall engineering effectiveness. ## Full Transcript
I have a very simple thesis which may
not be popular but is nonetheless true.
Coding assistants accelerate your
development practices whether they are
good or bad. In other words, you are
tying a giant rocket engine to whatever
engineering infrastructure practices you
have and you're saying go just go faster
go do more. You know what? If you have
any kind of weakness in your engineering
infrastructure layer, your best
practices layer, that choice to add
clawed code, to add codeex, which just
updated this week, that's going to end
up being net negative. Yeah, I said it.
It's going to end up being net negative.
I don't want that for you because there
are teams that are getting real gains.
There was a viral post recently on
Reddit called this is how we vibe coded
a fang. You know what it was about? It
wasn't about a vibe coding tool set that
would magically fix everything. It was
about the engineering infrastructure
decisions that matter. And I want to
focus on that today because you know we
could take this time and we could dive
into why codeex is the best thing since
sliced bread because at the top of the
news this week and that's all anyone can
talk about if they're in development is
like do we use codeex? Do we use cloud
code? You are asking the wrong question.
In most cases, the right question is at
the engineering infrastructure layer.
And you only get to the tool choice if
you've asked the right engineering
infrastructure questions. So I want to
give you in this conversation the
specific questions you should be asking
yourself as a technical leader, as a
technical team member, as a builder, as
a coder, as a vibe coder. Before you
pick a tool, ask yourself these first
because then when you use the tool,
you'll be able to go actually faster and
not slower. Question number one, what is
the problem that we are solving
specifically? Almost no one can answer
this actually. Just try answering it. Is
it speeding up boilerplate code? Is it
onboarding juniors? Is it reducing bugs
and repetitive tasks or something else?
If you have a vague goal like we're
going to boost the productivity of our
engineering team, I'm sorry, you've been
sitting in the seauite too long. Like I
need some specifics here. I need you to
say specifically this is the expectation
that we have for what this tool will do
for our engineers and why. Or if I'm a
builder individually, this is what it
will do for me and why. Maybe it's as
simple as, you know, I'm a builder and
using Devon or using Claude code, I'm
going to get time back. I can be in a
meeting and the thing can be building
anyway. Okay, that's fair. That's a
specific goal. You can talk about
optimizing for that goal and what the
tools and all of that, but if you don't
have specific problems you're trying to
solve, specific goals that you're
setting, you are already off in the
wrong direction. And I find that the
bigger the company, the harder this is
to do. Larger companies with larger
teams often have real trouble saying
what is the specific problem that
they're driving at, and it takes a lot
of work to peel the onion and get there.
But you need to question two. Do we have
strong engineering practices already
that are worth amplifying? Look at the
prerex. Do you have consistent code
patterns across your codebase? Do you
have date documents that are up to date?
Do you have actual review culture and
rigorous PR reviews? Do you have design
docs that you're proud of and you can
stand behind? If you don't, it is likely
that whatever agent you pick, whatever
tool you pick, AI is going to make
whatever you're doing worse. You need to
take the time to try to get your house
in order so that what you select has a
foundation to build on. AI is
surprisingly fragile in that regard.
It's amazing at so many things, but it
does need you to be discip. It needs you
to have good engineering practices for
it to ladder in as infrastructure in a
supportive way. And so many houses
don't. And again, this becomes something
that is big company challenging. If
you're a small coder on your own, you
can say, "Yeah, I keep all of my coding
decisions in this markdown file and I
and I have Cloud Code go and check it
and we're done." Or, you know, I review
all the poll requests myself. I know
that I do a good job. The bigger your
team is, the more complex this is and
the more you have to actually think
about this. Complexity scales
nonlinearly and that makes tool
assessment much more complex once you
get past even just a few developers into
the team or multi- teamam scale. Number
three, does the tool align with the
workflow and the tech stack? This is
complex but you have to ask yourself
what is the team already using? Are they
using cursor? Are they using VS Code?
Whatever it is, what is the code host?
Are we on GitHub? Are we in terminals in
some cases? And you have to think about
what real workflow compatibility looks
like. And I'm going to give you an extra
challenge here. You need to think about
workflow compatibility outside the
engineering team, which circles back to
my second question around engineering
practices. Assume you are living in a
world especially if you are
subenterprise level where people who are
not traditional engineers will have code
related ideas and potentially code
related prototypes they want to push
into the code stream in some fashion.
Maybe not to production. Maybe an
engineer has to review it. But there are
companies who are above single founder
level with teams where non-coders are
submitting poll requests thanks to their
use of a coding agent. Do you have
strong enough engineering practices to
sustain in that world? Do you have tools
that enable people who would not
normally have production commit
permissions to still be able to do some
degree of coding work and pass it to an
engineering architect? As far as I know,
there is no true plugandplay in that
world. You have to look at your unique
fingerprint and you have to decide what
is the tool stack that is going to be
compatible. I think one of the things I
want to call out here that was notable
to me as I was reviewing codeex and
claude code is that codeex seems to
implicitly presume a center of gravity
around a larger team. So much of codeex
is around can I automatically review the
PRs that are getting submitted for my
code right can codeex is already there
it can go in it can look in GitHub it
can review the PRs it can write up
reviews etc it can even go and fix and
address issues whereas clawed code is
more predicated around the idea that you
were working in the terminal and you
were building end to end and you may be
fixing issues and you may be working on
things besides code it's not that one is
good and one is bad it's that their
focus is different in the ecosystem And
you have to think about where the
leverage lies because it's absolutely
true that if you wanted claude code to
review your PRs, you can do it. People
have have done it all the time. Single
builders similarly use codecs all the
time. I know some that swear by it. And
so it's not that one tool is perfect for
any use case. It's that you have to
think about what works for you. Not just
from a model power perspective or from a
congruence to prompt perspective or from
a degree of comfort with the model or
even from a token burn perspective. You
have to think about it from an ecosystem
perspective. How does it fit? Number
four, do you know how you're going to
measure success? Do you know how you're
going to track changes that happen in
the codebase? What metrics matter to
you? Do you have metrics that are sort
of vanity metrics where you're like, "Oh
yeah, we're going to have so many
commits and that's going to be the way
we do it." Or it's lines of code. We're
going to brag to the CTO about the
number of lines of code that are AI
written and the CTO is going to write
this up into a summary and the CEO is
going to tweet it out, which by the way,
that totally happens. Is that really a
metric or is that a vanity metric?
Right? just just having lines of code is
something that any engineer will tell
you is a terrible metric for actual
productivity. So think about how you
want to measure value. One of the horror
stories, and I don't say this to scare
you, but I say this to warn you, it is
certainly possible to think you made
these decisions well, but to not really
factor in the ongoing impact of what I
will call LLM croft over time. And so
what I mean by that is the LLM is pretty
good. The LLM understands your codebase.
You think your engineering
infrastructure is up to the challenge,
but you don't have ongoing rhythms that
have the whole team checking and
reviewing LLM coding so that everybody
knows what's going on. Everybody is
conforming to best practices. The LLM
isn't drifting on its own. And what you
end up finding is that over time you
spend more and more and more and more of
the engineering manager's time or the
founders's time reviewing what the LLM
submitted. and they get less and less
time for leadership for strategic
thinking because at the end of the day
the codebase is more and more and more
difficult to understand because the LLM
has made effectively unintentional
architectural decisions that someone
else has to disentangle. And so my my
advice for you is more eyes are better
than not. Right? If you are in a
position where you have multiple eyes
and you're building with multiple
people, put those eyes and have
everybody's expectation be that AI code
doesn't go to prod unless someone looks
at it and can say, yes, this is
architecturally correct. Yes, this
actually works. That's not always the
case. There are lots of people who say,
you know what, we don't do that. We
believe it works. That's fine. And maybe
in a few cases, you are so buttoned up
and you are so clean and everything is
so well documented and it's so perfect
on your small team, you can get away
with that. But I'm not here for those
perfect 1enters. I'm here for everybody
else who lives in the reality of partial
documentation and everybody doing their
best and everybody trying to meet their
deadlines and everybody trying to code
according to the new best practices and
sometimes forgetting. Okay, fine. You
should be in a place where you can
actually institute engineering practices
that sustain the benefits of LLMs by
having regular reviews of the codebase
and regular reviews of LLM performance.
That's what I mean by can you measure
success? Can you actually track changes
over time? Number five is security and
data privacy thought through carefully
here. Do you feel comfortable with the
terms of service your vendor is
offering, the model maker is offering?
Are you okay have you checked for IP
leaks, vulnerabilities and generated
code, appliance issues, liability
generated by that code if it has a
mistake in it? you will need a much
higher bar on both QA and production
code to successfully have agents in
play. So yes, they can write code much
faster. There are security researchers
who will tell me that's just a way of
manufacturing vulnerabilities much
faster, right? And yes, some of them
will also catch vulnerabilities. And
that is actually one of the things that
OpenAI called out about Codeex is that
it's good at catching vulnerabilities in
code and that OpenAI themselves use
Codeex as part of their QA process
before going to production. So I'm not
here to tell you that Codeex and Cloud
Code don't add value. These two
companies are dog fooding their own
product and they are finding ways to get
value out of it. But I am here to point
out that they're not silver bullets and
that if we want to have a deep dive on
codecs, we got to talk about some of
these engineering infrastructure
practices first. Number six, do you
actually have buyin? Again, bigger
companies, nonlinear problem spaces,
this is going to be harder. If you have
junior engineers and senior engineers
and principles, and maybe you have some
some non-technical people like I talked
about, how are you planning for
education on prompting? How are you
planning for reviewing your AI outputs?
How are you planning for understanding
what learning use looks like for
juniors? So juniors understand how code
actually works and how system components
go together and they don't end up
overdeferring to AI. How do you budget
for the resources, the money, also the
time to actually learn this and not just
get into the temptation of set it and
forget it because these tools are
temptingly easy to set and forget. You
can just tell them to do things and
maybe the cost doesn't come due today,
right? Maybe the bill comes due in 6
months. You have to be disciplined to do
it today. Number seven, what is the
total cost beyond pricing? So you have
to look at setup, you have to look at
maintenance, you have to look at context
engineering costs, you have to look at
fixes for bad outputs. It is worth it if
you have a big team to do a pilot for
this because you can actually see over
two or three months for this individual
two pizza team, for this small team,
what did the value look like? And that
is exactly the pattern we see in a lot
of enterprises is that they will roll
this out for a small group, test it,
gather learnings, and then figure out
how that larger pathway will go. Again,
if you're a small team, it's super easy
to turn around. It's a two-way door. You
try codeex today, you say, "Oh, it feels
better." You dump cloud code. You try
cloud code tomorrow, when they release
something new, you say, "Oh, it feels
better." You dump codeex. It is not as
easy when you're on a bigger team. It
doesn't work that way. Okay, we've
talked about some of the foundational
questions to ask when you are getting
set up. I also am aware looking at the
poll requests, looking at the Reddits,
so many of you already use an AI coding
assistant. And so the second part of
this is really going to be asking what
are the questions you need to address as
a current user of a coding assistant to
figure out if AI is actually helping you
or hurting you and how you can
troubleshoot that and make the most of
your current AI coding assistant
implementation. And just like the first
seven, we're going to go through seven.
And you're going to start to see a
mapping there. I'm I'm deliberately
creating a doubling effect here so that
you can see how this maps from
pre-implementation into implementation.
Number one, is the AI amplifying
inconsistencies in the codebase. This
maps right back to the idea of having a
consistent infer layer, doesn't it? You
need to check and see if there are
antiatterns in the suggestions that are
persistent. You need to audit and say if
you have outputs that go wrong, do they
skew? Do they go wrong in a particular
direction? Do you need to fine-tune your
document standards in a particular way
so that the anti patterns disappear?
That's on you. You need to check that.
Number two, are you reviewing and
testing AI output? I talked about that
as something you need to be ready to do.
But are you actually doing are you
actually skipping the explanations? Are
you skipping the edge case tests that
it's recommending? Are you just saying
explain yourself and you're saying,
"Well, that's documentation and that's
good enough." Do you feel like you can
own the AI output from your AI coding
assistant? That's really the standard.
If you can stand behind it and say this
code is mine, okay, fair enough, but not
everybody does that. Number three, is
prompting or context an issue for you as
you start to drive coding assistance
forward. If you have vague prompts and
you're getting vague code, does your
team have clear specs? Does your team
have design docs? Does your team have
examples? You see how this goes right
back to the info layer? You can actually
diagnose this by testing small
incremental changes against small
incremental changes in your codebase,
right? You can change the documentation,
you can change the prompt, and you can
see if the codebase gets better, and you
can start to figure out where your test
cases are that you need to fix for your
infrastructure layer. So, this is
actually something where you can
pinpoint a fix if you're deliberate.
Number four, are errors due to tool
limitations that you have? Is your tool
infrastructure actually thought through?
Do you have model weaknesses? One of the
things that Codex emphasized is that
they understand the nonlinearity of
coding problems that some coding
problems need very token efficient
surgical changes and some coding
problems need very agentic long- form
changes and they produce some metrics to
say they're better at it. You know, your
mileage may vary. You'll have to see if
you agree with them. But the point is,
does the tool match the setup? Do you
have a setup that allows you to switch
models if you need to? Do you have a
setup that allows you to understand the
size of codebase you actually have or
particular niche domain or particular
niche language that you actually have?
An example of that is uh cloud code and
cobalt. Try it out sometime if you're a
cobalt person. See see what you think.
Number five, how is team usage? Are your
teams getting better at engineering? Are
your non-engineeries learning
engineering practices? How often do you
catch each other in changes made before
production so that you actually didn't
break something versus how often do you
catch things after production and you
wonder what happened? How often do you
have common newbie mistakes and do they
keep getting repeated? How often are
people copy pasting without
understanding? How often are people not
really giving thoughtful feedback to the
tool? There's a team culture thing here
that it's really up to leadership to
reinforce. Number six, are you measuring
what matters and can you track it and
show that you're actually delivering
value? I I am here to suggest that there
are two key pieces to this. One is tying
what engineering is doing to real
business use cases that matter, business
projects that matter, revenue, cost
efficiencies. I know engineers get
nervous about that, but you have to have
stakes in the game. The other is making
sure that your leading edge indicators
are solid. Do you understand what LLM
latency looks like? If you have
something that's in production, do you
understand how that you are testing for
edge cases and how those edge cases
actually manifest in production from an
LLM? Do you understand how to show that
your documentation is clean enough and
to run evals on your code performance?
So you can say, yeah, the the code
prompt to code quality is very high. We
have human evaluators that say that and
we also have some automated evals that
will actually say, you know, the number
of PR comments is going down, the
quality of PRs is going up, the number
of bugs that we've seen in production is
going down, like you have some concrete
things you can point to. And number
seven, if failures persist, is the issue
that your preparation was inadequate,
which means going back to the beginning
of this video and dump jumping right in,
or do you have something fundamental in
your stack? Maybe you need to think
about how you implement the codebased
context. Maybe you have to have an
agentic search approach where you're
searching through the context. Maybe the
rag is not the right approach for you.
Whatever it is, think about a
disciplined audit before you assume you
blame AI. Undisiplined teams blame AI
because it's cheaper and easier.
Disciplined teams root cause specific
problems and actually get value. These
are the things that I have to say over
and over again when people want to rush
in and say, "The Codeex news dropped,
Nate. The Codeex news dropped. What do I
do with Codeex?" Well, this is what I
say. Have you had these infrastructure
conversations first? Please have these
infrastructure conversations. They
matter. They help you build what
matters. These are the conversations
that you have to put in place so that
when AI amplifies all of the practices
you actually do on a daily basis, not
the ones you dream about. Well, now that
that you have these practices in place,
maybe it will amplify good stuff, not
bad stuff. Maybe it will actually help
you go faster, not slower. Maybe it will
help you deliver more quality code, not
break things. You get the idea. The
infrastructure matters. Take these
questions seriously. Take them as the
initial gate that you have to get
through before it makes sense to have
complicated conversations about which
tool to choose. We will do a deep dive
on codecs another