Learning Library

← Back to Library

Elevating Agentic Systems with Claude

13m • AI Engineer • ai-ml • deep-dive • intermediate • Watch on YouTube ↗

Key Points

Caitlyn, leading Anthropic’s “claw” developer platform, introduced the session by thanking Swix and emphasizing the audience’s experience building agents with LLM APIs.
The platform’s evolution centers on three pillars for maximizing Claude’s performance: exposing its reasoning capabilities, managing its context window, and providing Claude with a “computer” (tool‑use infrastructure).
New API controls let developers specify how long Claude should think or how many tokens it may spend, enabling the system to balance speed and depth for tasks like cloud‑code debugging.
Claude’s improved reliability in calling external tools is exposed through the API, and the Claude Code product showcases these features in real‑world agentic coding scenarios.

Sections

Full Transcript

# Elevating Agentic Systems with Claude **Source:** [https://www.youtube.com/watch?v=aqW68Is_Kj4](https://www.youtube.com/watch?v=aqW68Is_Kj4) **Duration:** 00:13:22 ## Summary - Caitlyn, leading Anthropic’s “claw” developer platform, introduced the session by thanking Swix and emphasizing the audience’s experience building agents with LLM APIs. - The platform’s evolution centers on three pillars for maximizing Claude’s performance: exposing its reasoning capabilities, managing its context window, and providing Claude with a “computer” (tool‑use infrastructure). - New API controls let developers specify how long Claude should think or how many tokens it may spend, enabling the system to balance speed and depth for tasks like cloud‑code debugging. - Claude’s improved reliability in calling external tools is exposed through the API, and the Claude Code product showcases these features in real‑world agentic coding scenarios. ## Sections - [00:00:00](https://www.youtube.com/watch?v=aqW68Is_Kj4&t=0s) **Anthropic Unveils Claude Code Platform** - Caitlyn from Anthropic explains how their new Claude Code platform empowers developers to build high‑performing LLM‑driven agents by exposing Claude’s capabilities and efficiently managing its context window. - [00:03:12](https://www.youtube.com/watch?v=aqW68Is_Kj4&t=192s) **Claude Tool Use & Context Management** - The speaker explains how Claude reliably calls built‑in and custom tools—crucial for cloud code tasks—and discusses the challenges of managing Claude’s context window, highlighting the MCP model context protocol as a solution. - [00:06:46](https://www.youtube.com/watch?v=aqW68Is_Kj4&t=406s) **Boosting Claude with Memory and Tools** - The speaker explains that integrating a memory tool with context editing yielded a 39% performance increase, and that expanding token windows plus granting Claude the ability to run its own code will further enhance its effectiveness. - [00:10:03](https://www.youtube.com/watch?v=aqW68Is_Kj4&t=603s) **Claude Agent Skills Enable Domain Expertise** - The speaker explains that Claude’s new agent skills—bundled scripts, instructions, and resources—let the model automatically invoke domain‑specific expertise (such as web‑design patterns) in concert with tools like code execution and MCP, enhancing its ability to generate context‑aware outputs. ## Full Transcript

0:13[music] 0:20Good morning. Um, so first let's give a 0:23huge thank you to Swix and the whole AI 0:26engineer organizing team for bringing us 0:28together. [applause] 0:32I'm Caitlyn and I lead the claw 0:34developer platform team at Anthropic. 0:36Um, so let's start with a show of hands. 0:38Who here is integrated against an LLM 0:41API to build agents? 0:43Okay, I'm talking to the right people. 0:45Love it. Um, so today I want to share 0:47how we're evolving our platform to help 0:49you build really powerful agentic 0:51systems using claude. 0:54So, we love working with developers who 0:56do what we call raising the ceiling of 0:58intelligence. They're always trying to 1:00be on the frontier. They're always 1:01trying to get the best out of our models 1:03and build the most high performing 1:05systems. Um, and so I want to walk you 1:08through how we're building a platform 1:09that helps you get the best out of 1:11Claude. Um, and I'm going to do that 1:12using a product that you hopefully have 1:14all heard of before. Um, it's an Agentic 1:16coding product. We love it a lot and 1:18it's called Claude Code. 1:22So when we think about maximizing 1:24performance um from our models, we think 1:26about building a platform that helps you 1:28do three things. Um so first, the 1:31platform helps you harness Claude's 1:32capabilities. We're training Claude to 1:34get good at a lot of stuff and we need 1:36to give you the tools in our API to use 1:38the things that Claude is actually 1:40getting good at. Next, we help you 1:43manage Claude's context window. Keeping 1:45the right context in the window at any 1:47given time is really really critical to 1:49getting the best outcomes from Claude. 1:52And third, we're really excited about 1:54this lately. We think you should just 1:55give Claude a computer and let it do its 1:57thing. So I'll talk about how we're 1:59we're evolving the platform to give you 2:01the infrastructure and otherwise that 2:02you need to actually let Claude do that. 2:08So starting with harnessing Claude's 2:10capabilities. Um, so we're getting 2:12Claude really good at a bunch of stuff 2:13and here are the ways that we expose 2:15that to you um in our API as ideally 2:18customizable features. So here's a first 2:20example um relatively basic. Claude got 2:23good at thinking um and Claude's 2:25performance on various tasks um scales 2:28with the amount of time you give it to 2:29reason through those problems. Um, and 2:32so, uh, we expose this to you as an API 2:34feature that you can decide, do you want 2:36Claude to think longer for something 2:38more complex or do you want Claude to 2:40just give you a quick answer? Um, we 2:42also expose this with a budget. Um, so 2:45you can tell Claude how many tokens to 2:47essentially spend on thinking. Um, and 2:49so for cloud code, um, pretty good 2:52example. Obviously, you're often 2:53debugging pretty complex systems with 2:55cloud code or sometimes you just want a 2:58quick, um, answer to the thing you're 3:00trying to do. And so, um, Claude Code 3:02takes advantage of this feature in our 3:03API to decide whether or not to have 3:06Claude think longer. 3:10Another basic example is tool use. 3:12Claude has gotten really good at 3:13reliably calling tools. Um, so we expose 3:16this in our API with both our own 3:18built-in tools like our web search tool, 3:21um, as well as the ability to create 3:23your own custom tools. You just define a 3:25name, a description, and an input 3:27schema. Um, and Claude is pretty good at 3:29reliably knowing when to actually go um, 3:31and call those tools and pass the right 3:34arguments. So, this is relevant for 3:36cloud code. Cloud code has many, many, 3:38many tools and it's calling them all the 3:40time to do things like read files, 3:42search for files, write to files, um, 3:45and do stuff like rerun tests and 3:47otherwise. 3:50So, the next way we're evolving the 3:51platform to help you ma maximize 3:53intelligence from claude um, is helping 3:55you manage Claude's context window. 3:57Getting the right context at the right 3:59time in the window is one of the most 4:01important things that you can do to 4:02maximize performance. 4:05But context management is really complex 4:07to get right. Um especially for a coding 4:10agent like Claude Code. You've got your 4:12technical designs, you've got your 4:13entire codebase. Um you've got 4:15instructions, you've got tool calls. All 4:17these things might be in the window at 4:19any given time. And so how do you make 4:21sure the right set of those things are 4:23in the window? Um, so getting that 4:25context right and keeping it optimized 4:27over time is something that we've 4:28thought a lot about. 4:31So let's start with MCP model context 4:34protocol. We introduced this a year ago 4:36and it's been really cool to see the 4:37community swarm around adopting um MCP 4:41as a standardized way for agents to 4:43interact with external systems. Um, and 4:46so for cloud code, you might imagine 4:49GitHub or Century. there are plenty of 4:51places kind of outside of the agent's 4:53context where there might be additional 4:55information or tools or otherwise that 4:58you want your agent to be able to 4:59interact with or the cloud code agent to 5:01be able to interact with. Um, and so 5:03this will obviously get you much better 5:04performance than an agent that only sees 5:07the things that are in its window as a 5:08result of your prompting. 5:12Uh, so the next thing is memory. So, if 5:15you can use tools like MCP to get 5:17context into your window, we introduced 5:19a memory tool to help you actually keep 5:21context outside of the window that 5:23Claude knows how to pull back into the 5:25window only when it actually needs it. 5:27Um, and so we introduced the first 5:29iteration of our memory tool as 5:31essentially a clientside file system. 5:33So, you control your data, but Claude is 5:36good at knowing, oh, this is like a good 5:37thing that I should store away for 5:39later. And then, uh, it knows when to 5:41pull that context back in. 5:43[clears throat] So for cloud code, you 5:44could imagine um your patterns for your 5:47codebase or maybe your preferences for 5:49your git workflows. These are all things 5:51that claude can store away in memory and 5:53pull back in only when they're actually 5:55relevant. 5:58And so the third thing is context 5:59editing. If memory helps you keep stuff 6:02outside the window and pull it back in 6:04when it makes sense, context editing 6:06helps you clear stuff out that's not 6:08relevant right now and shouldn't be in 6:09the window. Um, so our first iteration 6:11of our context editing is just clearing 6:13out old tool results. Um, and we did 6:16this because tool results can actually 6:17just be really large and take up a lot 6:19of space in the window. And we found 6:21that tool results from past calls are 6:23not necessarily super relevant to help 6:25claude get good responses later on in a 6:28session. And so you can think about for 6:30cloud code, cloud code is calling 6:32hundreds of tools. Um, those files that 6:34it read otherwise, all these things are 6:36taking up space within the window. Um so 6:39they take advantage of um context 6:41management to clear those things out of 6:43the window. 6:46And so um we found that if we combined 6:48our memory tool with context editing, we 6:51saw a 39% bump in performance over o 6:55over the benchmark on our own internal 6:57evals. Um which was really really huge. 7:00And so it just kind of shows you the 7:01importance of keeping things in the 7:03window that are only relevant at any 7:05given time. And we're expanding on this 7:07by giving you larger context windows. So 7:09for some of our models, you can have a 7:11million token context window. Combining 7:13that larger window with the tools to 7:15actually edit what's in your window 7:17maximizes your performance. Um, and over 7:20time we're teaching Claude to get better 7:21and better at actually understanding 7:23what's in its context window. So maybe 7:25it has a lot of room to run, maybe it's 7:27almost out of space. Um, and Claude will 7:29respond accordingly depending on how 7:31much time uh or how much room it has 7:33left in the window. 7:37So, here's the third thing. Um, we think 7:39you should give Claude a computer and 7:41just let it do its thing. We're really 7:43excited about this one. Um, because 7:44there's a lot of discourse right now 7:46around agent harnesses. Um, you know, 7:49how much scaffolding should you have? 7:50How opinionated should it be? Should it 7:52be heavy? Should it be light? Um, and I 7:55think at the end of the day, Claude has 7:58access to writing code. And if Claude 8:00has access to running that same code, it 8:02can accomplish anything. you can get 8:03really great professional outputs for 8:05the things that you're doing just by 8:06giving Claude runway to go and do that. 8:09But the challenge for letting you do 8:10that is actually the infrastructure as 8:13well as stuff like expertise like how do 8:14you give cloud access to things that um 8:16when it's using a computer it will get 8:18you better results. 8:21So a fun story is we recently launched 8:23cloud code on web and mobile. Um and 8:26this was a fun project for our team 8:28because we had a lot of problems to 8:29solve. When you're running cloud code 8:31locally, cloud code is essentially using 8:33your machine as its computer. But if 8:36you're starting a session on the web or 8:38on mobile and then you're walking away, 8:40what's happening? Like where is that 8:42where is um cloud code running? Where is 8:44it doing its work? Um and so we had some 8:46hard problems to solve. We needed a 8:48secure environment for cloud to be able 8:50to write and run code that's not 8:52necessarily like approved code by you. 8:54Um we needed to solve or container 8:57orchestration at scale. Um and we needed 8:59session persistence um because uh we 9:02launched this and many of you were 9:03excited about it and started many many 9:04sessions and walked away and we had to 9:06make sure that um all of these things 9:08were ready to go when you came back and 9:10um wanted to see the results of what 9:12Claude did. 9:14So one key primitive in this is our code 9:17execution tool. Um so we released our 9:19code execution tool in the API um which 9:22allows Claude to run write code and run 9:24that code in a secure sandboxed 9:26environment. Um, so our platform handles 9:29containers, it handles security, and you 9:31don't have to think about these things 9:32because they're running on our servers. 9:34Um, so you can imagine deciding that um, 9:37you you want Claude to write some code 9:39and you want Claude to go and be able to 9:41run that code. And for cloud code, 9:43there's plenty of examples here. Um, 9:45like make an animation more sparkly that 9:48uh, you want Claude to actually be able 9:49to run that code. Um, so we really think 9:51the future of agents is letting the 9:53model work pretty autonomously within a 9:55sandbox environment and we're giving you 9:57the infrastructure to be able to do 9:58that. 10:01And this gets really powerful once you 10:03think about giving the model actual 10:05domain expertise in the things that 10:07you're trying to do. So we recently 10:09released agent skills which you can use 10:11in combination with our code execution 10:13tool. Skills are basically just folders 10:15of scripts, instructions, and resources 10:18that Claude has access to and can decide 10:20to run within its sandbox environment. 10:23Um, it decides to do that based on the 10:26request that you gave it as well as the 10:27description of a skill. Um, and Claude 10:30is really good at knowing like this is 10:31the right time to pull this skill into 10:33context and go ahead and use it. And you 10:35can combine skills with tools like MCP. 10:38So MCP gives you access to tools and 10:40access to context. Um, and then skills 10:43give you the expertise to actually make 10:44use of those tools and make use of that 10:46context. Um, and so for cloud code, a 10:49good example is web design. Maybe 10:51whenever you launch a new product or a 10:53new feature, um, you build landing 10:55pages. And when you build those landing 10:57pages, you want them to follow your 10:59design system and you want them to 11:00follow the patterns that you've set out. 11:02Um, and so Claude will know, okay, I'm 11:05being told to build a landing page. This 11:07is a good time to pull in the web design 11:08skill. um and use the right patterns and 11:11and design system for that landing page. 11:13Uh tomorrow Barry and Mahes from our 11:15team are giving a talk on skills. 11:17They'll go much deeper and I definitely 11:19recommend checking that out. 11:23So these are the ways that we're 11:24evolving our platform um to help you 11:26take advantage of everything that Claude 11:28can do to get the absolute best 11:30performance for the things that you're 11:31building. First, harnessing Claude's 11:34capabilities. So, as our research team 11:36trains Claude, we give you the API 11:38features to take advantage of those 11:40things. Next, managing Claude's context, 11:42it's really, really important to keep 11:44your context window clean with the right 11:46context at the right time. And third, 11:49giving Claude a computer and just 11:50letting it do its thing. 11:54So, we're going to keep evolving our 11:55platform. Um, as Claude gets better and 11:58has more capabilities and gets better at 12:00the capabilities it already has, we'll 12:02continue to evolve the API around that 12:04so that you can stay on the frontier and 12:06take advantage of the best that Claude 12:08has to offer. Um, second, as uh, memory 12:13and context evolve, we're going to up 12:15the ante on the tools that we give you 12:17in order to let Claude decide what to 12:19pull in, what to store away for later, 12:21and what to clean out of the context 12:22window. [clears throat] And third, we're 12:24really going to keep leaning into agent 12:26infrastructure. Some of the biggest 12:28problems with the idea of just let 12:29Claude have a computer and do its thing 12:31are those problems that I talked about 12:33around orchestration, secure 12:35environments, and sandboxing. And so 12:37we're going to keep working um to make 12:38sure that those are um ready for you to 12:42take advantage of. 12:44Um and I'm hiring. We're hiring at 12:47Anthropic. We're really growing our 12:48team. Um, and so if you're someone who 12:50loves um, building delightful developer 12:53products um, and if you're excited about 12:55what we're doing with Claude, we would 12:57love to work with you across end product 12:59design um, Devril, lots of functions. So 13:02please reach out to us 13:05and thank you [applause] 13:20[music]