Learning Library

← Back to Library

Configure Claude Code for Hours‑Long Autonomy

13m • Developers Digest • ai-ml • tutorial • intermediate • Watch on YouTube ↗

Key Points

Claude Opus 4.5 can stay autonomous for about 4 hours 49 minutes at a 50 % completion rate, a dramatic leap from earlier models like GPT‑4, which only lasted roughly 5 minutes.
To achieve multi‑hour runs you must configure the Claude Code “agent harness” for added persistence; simply invoking Claude in the CLI won’t keep it alive.
Anthropic’s official Cloud Code setup walks you through permission prompts and lets you define guardrails, crucial because the agent can execute a wide range of commands (e.g., git commits, pushes, deletions).
Treat Claude’s autonomy like a car’s autopilot: first understand its capabilities, test it in controlled mode, then progressively enable longer, unsupervised sessions once you trust the system.
Pure prompt‑driven runs tend to “go lazy” or fail on long tasks, so adding the harness and proper configuration is the key solution for sustained, reliable execution.

Sections

Full Transcript

# Configure Claude Code for Hours‑Long Autonomy **Source:** [https://www.youtube.com/watch?v=o-pMCoVPN_k](https://www.youtube.com/watch?v=o-pMCoVPN_k) **Duration:** 00:13:36 ## Summary - Claude Opus 4.5 can stay autonomous for about 4 hours 49 minutes at a 50 % completion rate, a dramatic leap from earlier models like GPT‑4, which only lasted roughly 5 minutes. - To achieve multi‑hour runs you must configure the Claude Code “agent harness” for added persistence; simply invoking Claude in the CLI won’t keep it alive. - Anthropic’s official Cloud Code setup walks you through permission prompts and lets you define guardrails, crucial because the agent can execute a wide range of commands (e.g., git commits, pushes, deletions). - Treat Claude’s autonomy like a car’s autopilot: first understand its capabilities, test it in controlled mode, then progressively enable longer, unsupervised sessions once you trust the system. - Pure prompt‑driven runs tend to “go lazy” or fail on long tasks, so adding the harness and proper configuration is the key solution for sustained, reliable execution. ## Sections - [00:00:00](https://www.youtube.com/watch?v=o-pMCoVPN_k&t=0s) **Configuring Claude Code for Long-Running Autonomy** - The video explains how to set up Claude Code with a persistent agent harness, enabling the model to operate autonomously for several hours—far surpassing earlier models like GPT‑4. - [00:03:21](https://www.youtube.com/watch?v=o-pMCoVPN_k&t=201s) **Using Stop Hooks for Deterministic Agent Flow** - The speaker explains how stop hooks trigger automatically after Claude finishes a task, enabling automated actions like running tests and feeding results back into the workflow for iterative improvement. - [00:07:53](https://www.youtube.com/watch?v=o-pMCoVPN_k&t=473s) **Iterative To-Do Automation with Claude** - Explains how to direct Claude to process a markdown to‑do file, marking tasks complete step‑by‑step while running validation tests after each iteration to catch failures early. - [00:11:35](https://www.youtube.com/watch?v=o-pMCoVPN_k&t=695s) **Ralph Loop Stop‑Hook Demonstration** - The speaker walks through configuring the Ralph loop with max iterations and a completion promise, illustrating how synthetic stop‑hook triggers pause and resume Claude’s step‑by‑step processing of a to‑do list. ## Full Transcript

0:00In this video, I'm going to be showing 0:01you how to set up Claude Code to be able 0:02to run autonomously for hours. Now, just 0:05recently, Meter came out with the latest 0:07benchmark of Claude Opus 4.5 that showed 0:10that this model can perform 0:12independently and autonomously for 4 0:14hours and 49 minutes. Now, this is at a 0:1850% completion rate. If we go down to 0:2080%, this number does drop down quite a 0:23bit. But the main thing with this is as 0:25we take a look at the trajectory of how 0:27the models have improved over time. If 0:29we go back to when GPD4 was a huge deal 0:32at the time, just to give you an idea in 0:34terms of how long this model could run 0:36for, this was only able to run for 5 0:38minutes. But now we're entering a new 0:40era where these models can run for quite 0:42a long time and they're getting 0:44increasingly accurate at actually being 0:46able to have successful runs. Now, in 0:48terms of actually setting this up, you 0:50aren't going to be able to just set it 0:51up within cloud code. you aren't just 0:53going to be able to type claude within 0:54your CLI and be able to walk away for 0:56minutes or even hours. You do have to 0:58configure the agent harness a little bit 1:00just to give it a little bit more 1:02persistence. Now, the nice thing with 1:03this actually is it actually isn't that 1:05difficult. And I'm going to be showing 1:07you one of the official ways in terms of 1:09how Anthropic actually sets this up and 1:11how some of the members on that team 1:12leverage this method to have this 1:14actually run for a particularly long 1:16time. If you've used Cloud Code before, 1:18the first time that you run it, it will 1:20actually ask you permission for 1:21everything that you're doing. And one of 1:23the things with Cloud Code is it's very 1:24similar to a self-driving car. Now, the 1:26first time that I got in a car that had 1:28an autopilot feature, one of the first 1:30things that they said to me is actually 1:32don't turn this on by default. Actually 1:34get comfortable with being able to 1:36leverage it, know how to turn it on and 1:38off, and then as soon as you actually 1:39trust the system, then you'll be a lot 1:41more comfortable with actually turning 1:43it on. It's a very similar thing within 1:45Claude Code. You do want to generally 1:46get an idea in terms of what it will do 1:48or what it's capable of doing because it 1:50can run a lot of different commands on 1:52your machine. It can commit to git. It 1:54can push things. It can delete things. 1:55If you're not careful, it can do things 1:57that you don't want it to do. But once 1:58you know the capabilities, you'll get 2:00familiar with some of the guardrails 2:01that you might want to have in place. 2:03Now, when you go and you run cloud code 2:04for the first time, you'll see that it 2:06will go through this process and it will 2:07ask you these different questions. But 2:09one of the issues is oftent times when 2:11you want it to run tests or if there's 2:13something that fails, if you're trying 2:14to just have it go off for a 2:16particularly long time, if you try and 2:18do that with just prompting, you'll know 2:19that it will often get lazy. Part of the 2:21solution with this is actually making it 2:23a little bit more deterministic. In the 2:25case of tests, for instance, what you 2:26can do is you can actually have tests 2:28run automatically once Claude finishes. 2:30Now, if they fail, you can actually feed 2:32that input back into Claude code. And 2:34what this will do is it will create this 2:36loop where claude code has this 2:37non-deterministic LLM pattern. But when 2:40you equip it with something called hooks 2:42and the stop hook in particular, that's 2:44going to allow it to persist much much 2:46longer. There are a number of different 2:48hooks within cloud code. Effectively, 2:50what hooks are is they're shell commands 2:51that are going to fire at particular 2:53points within the cloud workflow. So you 2:55can sort of think of it like git hooks, 2:57but effectively for AI and cloud coding. 2:59One of the things with these is you see 3:00there's a number of different hooks in 3:02terms of where you can actually leverage 3:03this. Now there are a number of 3:05different hooks within cloud codes. What 3:06this will allow you to do is you can 3:08actually block it from running 3:09particular commands if you don't want it 3:10to run things. You can actually check 3:12before it actually invokes those 3:13different tool calls which could 3:15potentially be detrimental. You might 3:17just want to block it from not 3:18leveraging git or whatever it might be. 3:19Now what you can also do is you can have 3:21this after the tool use is complete. And 3:23additionally what you can do is you can 3:25actually call these events after the 3:26tool use is done. But what I'm going to 3:28focus on within this video is the stop 3:30hook. And what this is helpful for is 3:32when Claude actually finishes the 3:34process, but it might ultimately come 3:36back and ask you a question. Even if you 3:38ask it to go and focus on something for 3:40a particularly long time, you might get 3:41creative and try and just prompt your 3:43way to have it run for a long time. But 3:45what the stop hook or any hook will do 3:47is it will actually allow you to have 3:48something more deterministic within this 3:50agentic flow. You will be able to bank 3:52on whenever that stop hook calls. You 3:54can actually have a process to run 3:56through. Now, the power of the stop 3:57hooks is if you just think about it, as 3:59soon as Claude finishes the work, what 4:01the hook will do is it will fire 4:02automatically and you can configure this 4:04for a number of different things. If you 4:06want it to actually run different unit 4:08tests or integration tests or whatever 4:09it is, you can have those set up to run 4:12as soon as the process is finished. And 4:14then if those tests fail, Claude will be 4:16able to see that output and it will be 4:18able to feed that in and start the 4:19process and repeat until it's done. And 4:21one of the key insights with this is if 4:23you just ran your tests is Claude 4:25wouldn't know if the tests pass unless 4:27you actually ran it within the process. 4:29But what stop hooks allow you to do is 4:30you can actually pass that in at 4:32arguably one of the best times because 4:34it's going to be able to show you okay 4:35after all of the edits and things that 4:37it did. It can actually verify whether 4:39it works or not. And this can be used in 4:40a number of different ways. Now in terms 4:42of some of the real world use cases for 4:44this. So the creator of Cloud Code, 4:46Boris Journey, I'll just read through 4:47this tweet quickly. He said, "When I 4:49created Claude Code as a side project 4:50back in September 2024, I had no idea it 4:53would grow to what it is today. It is 4:55humbling to see that Claude Code has 4:57become a core dev tool for so many 4:59engineers, how enthusiastic the 5:01community is, and how people are using 5:02it for all sorts of things from coding 5:04to DevOps to research to non-technical 5:06use cases. This technology is alien and 5:09magical, and it makes it so much easier 5:11for people to build and create. 5:12Increasingly, code is no longer the 5:14bottleneck. A year ago, Claude struggled 5:16to generate bash commands without 5:18escaping issues. It worked for seconds 5:20or minutes at a time. We saw early signs 5:22that it may become broadly useful for 5:24coding one day. Fast forward to today, 5:26the last 30 days, I landed 259 PRs, 457 5:32commits, and 40,000 lines added, and 5:3638,000 lines removed. Every single line 5:39was written by Claude Code and Opus 4.5. 5:42Claude consistently runs for minutes, 5:44hours, and days at a time using stop 5:46hooks. Software engineering is changing, 5:48and we are entering a new period in 5:50coding history. And we're still just 5:52getting started. And then within here, 5:54you can see all of the different usage 5:56and the number of tokens that he had 5:57leveraged. Just to give you an idea, now 5:59mind you, this is the creator of Claude 6:01Code. This is someone who arguably knows 6:02the system better than anyone else. But 6:04just to show you actually what this can 6:06perform and I don't actually think that 6:08this is just marketing or anything like 6:10this he is definitely a very genuine 6:12person and if you've leveraged claude 6:14code in particular with Opus 4.5 you 6:16will probably know exactly what he's 6:18talking about. Now in terms of one of 6:20the things that I noticed within this 6:21tweet that I did want to pull up is 6:23there was a question from Simon Willis 6:25and he asked okay Claude consistently 6:27runs for minutes, hours and days at a 6:28time using stop hooks and then he asked 6:30him to expand on this. In his response, 6:32Boris mentioned when Claude stops, you 6:34can use a stop hook to poke at it, tell 6:37it to keep going. And then he gave an 6:38example within one of their official 6:40repositories to what they call Ralph 6:42Wiggum. Now, if you know Ralph Wigum, 6:44he's from the Simpsons. And one of the 6:46things with Ralph is he's determined to 6:48get it done. So, he'll just keep trying 6:49until it actually works, which is sort 6:51of a funny analogy in terms of how you 6:53can actually get Claude to work. Now, 6:54effectively, how this works is you're 6:56going to be able to run the quote 6:57unquote Ralph loop. You'll be able to 6:59pass in your task. Once you pass in your 7:01task, it's going to create a state file 7:02within your Claude folder. Once that's 7:05set up, as soon as Claude works through 7:06what you're trying to do and tries to 7:08exit, the stop hook will block it from 7:10exiting and it will refeed what it's 7:12trying to do within the prompt. And then 7:14this process will repeat until the max 7:16iterations or the promise is actually 7:18met. Where this is useful, it could be 7:19useful within a test-driven development 7:21workflow, but also where this can be 7:23helpful is if you have particularly long 7:25to-do lists. Let's say you scaffold out 7:27an initial plan for how you want to have 7:29your feature or application or whatever 7:32sort of level that you actually want to 7:34plan out. If you want to have cloud code 7:36go through that list without actually 7:38stopping, what you can do is you can 7:39actually point it at the to-do list and 7:41then it will have those tasks that it 7:43will loop through and it won't actually 7:45finish until it actually meets the 7:46criteria. This can also be helpful in a 7:48number of other scenarios. Think things 7:50like large refactors or migrations. 7:52Within the to-do example, what you can 7:53do is you can set up something like a 7:55to-do MD file. And what you can do is 7:57you can instruct Claude to go through 7:58these tasks and actually mark them 8:00complete as you go. For instance, let's 8:02imagine you have a task.md file. What 8:05you can do within here is you can use 8:06the raph loop to complete all these 8:08tasks in the to-do.md. Then what you can 8:11also do in addition to this is you can 8:13also include tests after each iteration. 8:15And this can be particularly helpful 8:17because oftent times if you don't 8:19include a validation step while it's 8:21actually running through, it might go 8:22through a particularly long to-do list, 8:24but then get to the end and realize 8:26there might have been some catastrophic 8:28failures that sort of built on top of. 8:30So being able to actually iteratively go 8:32through and have the system build on top 8:35of what it's done, it can be a good way 8:37in terms of actually leveraging these 8:39systems and if you can try and validate 8:41the work as much as you can. So this can 8:43be with unit test integration test 8:45leveraging playright for things on the 8:46front end or leveraging claude within 8:48Chrome and all of those types of things. 8:50If you haven't used to-do list within 8:51claude code now there is a to-do feature 8:54built right in where it will just decide 8:56to leverage that when it needs to. But 8:58additionally you can also do this 8:59yourself if you want to have a little 9:00bit more control over it. You can 9:02instruct Claude to go through a markdown 9:04file. You can put just like you see on 9:06the slide here all of the different 9:08things that you want it to do including 9:09all of the different validation steps 9:11along the way. And then with each 9:12iteration, you will see the cloud will 9:14go through and it will pick up all of 9:16the unchecked items. It will implement 9:18the feature or fix or whatever you have 9:20within that actual line item. It will 9:22run the unit test and integration test 9:24depending on what you have within the 9:25list. And then if the test fails, it 9:27will go ahead and it will fix that 9:29before it goes and continues on and 9:31marks it complete. What this allows you 9:32to do is you can sort of just walk away 9:34and then hopefully come back to a 9:36finished list working feature or working 9:39application depending on the scope of 9:40what you actually put within your to-do 9:42list. Now, the other thing that's cool 9:43with this is you do have the option 9:44where you can stack multiple hooks 9:46together. And the other thing with this 9:48is when you leverage hooks is you can 9:49leverage these interchangeably and you 9:51don't necessarily need to just use one. 9:53For instance, within my cloud 9:54environment, I have a number of 9:56different hooks that are set up that 9:57invoke different actions at different 9:59times. Thanks for logging, thanks for 10:01notifying me, all of these types of 10:03things are particularly helpful. Now, as 10:05you can imagine, by leveraging these 10:07more deterministic patterns combined 10:09with the non-deterministic agentic 10:11harness that is cla code and the model, 10:13because often times you just can't 10:15predict what it will ultimately do. You 10:17can have maybe a high degree of 10:18confidence if you know what you're 10:19passing within context, but oftent times 10:22for these long running tasks, there is 10:24the potential where it can go off 10:25course. And having things that can 10:27actually check it and run these more 10:29deterministic triggers and scripts at at 10:32particular times can be very very 10:34helpful. This can keep your code clean. 10:36This can prevent dangerous operations 10:38and like I've mentioned a couple times 10:40already, ensure that tests pass before 10:42actually stopping. Now, to get this set 10:44up, one of the fastest way to get going 10:45with this is if we go to the Ralph 10:47Wiggum plugin. And what you'll notice 10:49within here is what plugins are is 10:51actually being able to configure a 10:53number of different things within Cloud 10:54Code at once. You can have sub aents, 10:56you can have skills, and in this case, 10:58you can actually leverage hooks. Now, 10:59the core piece of this is if we take a 11:01look at the hooks, what we'll notice 11:03within here is we have the stop hook 11:04trigger. This is going to be how we 11:06actually invoke the different hooks that 11:08we have on this stop event. If I go back 11:10here and we take a look at this stop 11:12hook, this is an example in terms of 11:14what a hook looks like in terms of what 11:17you can actually invoke every time that 11:19it stops. And you can have a number of 11:21different scripts that invoke whenever 11:23Claude actually stops. Within here, you 11:25can see we have a formatter, iteration, 11:26max iteration, as well as the completion 11:28promise. Once you have it all installed, 11:30what you're going to be able to do is 11:31have this slash command for Ralph loop. 11:34So within the Ralph loop, what you're 11:35going to be able to do is put in your 11:37prompt, the number of max iterations as 11:39well as the completion promise. So what 11:41actually validates that that step is 11:44complete. Within here, what I can do is 11:45I can specify go through my to-do list 11:48step by step and mark down every step 11:50that is complete once it's actually 11:52done. I'll go ahead and I'll kick this 11:54off. What we see on the lefth hand side 11:56here is I have a number of different 11:57steps just to demonstrate this. We'll 11:59create a text file. But what you'll 12:01notice is in between each of these is 12:03I'm synthetically trying to trigger that 12:05stop process within Claude. And this is 12:07just to demonstrate what that hook will 12:09look like when it is triggered within 12:11Claude. We can see it went ahead. It 12:13completed the first task here. And now 12:15for our second task. What you'll notice 12:16within here is we have this stop hook 12:18error where it says go through my to-do 12:20list step by step and mark down every 12:22step that is complete once it's actually 12:24done. And now what this looks like and 12:26how it can persist is instead of 12:28actually returning a message to you, it 12:30will call this trigger and it will pass 12:31this back into Claude and have it just 12:34continue to go through the process 12:36within here. Within here, if I just 12:38scroll down, I see that number three is 12:40done. Once it gets to four, again, we 12:42have that hook being triggered as if 12:44there was a stop and returning a message 12:46back to us. And instead of stopping, 12:48we're just passing that back into 12:50context to have it to continue to go 12:52through the list. Now the one thing that 12:53I do want to mention with Ralph loops or 12:55this type of process is just make sure 12:57that you do set the max number of 12:59iterations as well as your promise. 13:02Otherwise this will run through. You see 13:04that my task list is complete. But if 13:06you don't specify that you have a 13:08completion promise or a max iteration it 13:11will just continue to go through and the 13:13loop will run infinitely. So, just make 13:15sure that you do actually specify both 13:17of these cuz otherwise you don't want to 13:18get in a scenario where you're just 13:20burning all of these tokens by 13:21effectively having an infinite loop. 13:23Otherwise, that's pretty much it for 13:25this video. I'll put the link to the 13:26GitHub repository within the description 13:28of the video. But otherwise, if you 13:30found this video useful, please like, 13:32comment, share, and subscribe. 13:33Otherwise, until the next one.