Learning Library

← Back to Library

Self-Hosting LLMs on Windows

9m • Unknown Channel • devops • interview • intermediate • Watch on YouTube ↗

Key Points

The conversation highlights how generative AI is becoming ubiquitous, offering personalized assistance like car‑buying advice without the user needing to learn a new interface.
Robert Murray demonstrates that you can run powerful open‑source models (e.g., Llama 3, IBM’s Granite) locally on a personal computer, eliminating reliance on cloud GPU farms.
His stack consists of Windows 11 → WSL2 (Linux layer) → Docker, with models downloaded from Ollama.com and executed via the command line for rapid responses.
By adding a Docker‑based UI such as Open WebUI, he provides a user‑friendly chat interface that makes locally hosted AI as accessible as cloud‑based services.

Sections

Full Transcript

# Self-Hosting LLMs on Windows **Source:** [https://www.youtube.com/watch?v=BvCOZrqGyNU](https://www.youtube.com/watch?v=BvCOZrqGyNU) **Duration:** 00:09:43 ## Summary - The conversation highlights how generative AI is becoming ubiquitous, offering personalized assistance like car‑buying advice without the user needing to learn a new interface. - Robert Murray demonstrates that you can run powerful open‑source models (e.g., Llama 3, IBM’s Granite) locally on a personal computer, eliminating reliance on cloud GPU farms. - His stack consists of Windows 11 → WSL2 (Linux layer) → Docker, with models downloaded from Ollama.com and executed via the command line for rapid responses. - By adding a Docker‑based UI such as Open WebUI, he provides a user‑friendly chat interface that makes locally hosted AI as accessible as cloud‑based services. ## Sections - [00:00:00](https://www.youtube.com/watch?v=BvCOZrqGyNU&t=0s) **Home‑Hosted Generative AI Setup** - The speaker outlines common AI use cases and introduces Robert Murray’s personal, low‑cost Windows 11 setup that runs models such as Llama 3 and IBM Granite on a home computer without a cloud‑based GPU farm. - [00:03:02](https://www.youtube.com/watch?v=BvCOZrqGyNU&t=182s) **Docker UI and VPN Setup** - The speaker explains using Open WebUI in Docker, adding a VPN container for remote access, and outlines minimal hardware specs like 8 GB RAM and 1 TB storage. - [00:06:17](https://www.youtube.com/watch?v=BvCOZrqGyNU&t=377s) **Secure, Private AI Deployment Practices** - The speakers discuss how running an AI chatbot on personal hardware with a private data store and open‑source models safeguards data ownership, reduces exposure to poisoned training, and leverages community vetting for greater security. - [00:09:29](https://www.youtube.com/watch?v=BvCOZrqGyNU&t=569s) **Soliciting Viewer Security Input** - The host praises Jeff’s security insight, references Robert’s method, and asks the audience to suggest how they would improve the system in the comments. ## Full Transcript

0:00Martin, it seems like AI is everywhere these days. 0:03Finally, we have a computer that actually understands my language instead of me having to learn its language. 0:08A system that understands me. 0:10For instance, what if I'm looking to buy a new car and I need to do some research on the alternatives? 0:16Yeah, you could tell the chatbot to act as a car expert and then you can ask it, 0:21what would be the difference in cost to operate a gas powered car versus 0:25a hybrid car versus an EV car and then get guidance on the decision. 0:29And if it helped me find a rebate from the power company, it could pay for itself in just one instance, 0:36and if I enjoyed tinkering and DIY projects, 0:39wouldn't it be cool to learn how the technology works and host my very own instance of all of this? 0:44Yeah, very cool. 0:45And in fact, we have a colleague, 0:47Robert Murray, who has done just that with equipment in his own home office. 0:52Wait, you mean without a server farm of GPUs that dim the lights every time you ask it to do something? 0:58Absolutely. 0:58So let's bring him in to tell us how he did it. 1:02Today, requests to Generative AI typically connect to an AI model 1:05hosted somewhere on a cloud, 1:07but Robert here has built an infrastructure 1:10to host AI models like Llama 3 and IBM's Granite 1:14on his own personal infrastructure. So Robert, I want to understand how you did this. 1:20Absolutely. 1:20o let's start with this box, which represents your computer at home. 1:25So tell me sort of the stack that you built here. 1:28Sure. So I started with Windows 11. 1:31All right, so it's just a straight up. 1:32Because I have it. 1:34Yeah, ok. 1:35That was the reason, just because it's there. 1:37It's there. 1:38OK, so you've got Wins 11, and then what's on top of that? 1:41Well, I unleashed WSL2. 1:45Now you're gonna have to tell me what WSL2 does. 1:48It's basically Linux on Windows. 1:51I'm going to think that there's probably a virtualization layer coming. 1:54Yes, there definitely is and that is Docker. 1:58Ok, Docker is running on top of all of this. 2:01Now, we need some AI models. So where did you get your AI models from? 2:09I pulled them down from Ollama.com. 2:12OK, so if we take a look at the AI models, what are some of the models that you actually took? 2:18Oh, so I started with Granite. 2:20Right, IBM's granite model, yeah. 2:22Llama, 2:24and there's so many other models that you can pull down. 2:27Yeah. 2:28They're there, Open source. 2:29A whole bunch of open source models. 2:31Okay, so we've got a Docker machine 2:32here with Windows 11, WSL2. 2:36You've downloaded these models from Ollama. 2:38Is this now the solution? 2:41Well, I actually can use this. I can run all this right from the command line. 2:44Wow. Okay. So you can open a terminal window and then start chatting with Llama or Granite. 2:49Yes. Very, very fast. 2:50But most of the AM models that are cloud hosted, 2:53you do that on a chat interface, a UI. So how are you able to add a UI to all of this? 2:59Docker containers. 3:01Ah, okay, all right. 3:02So let's put some Docker containers in. What did you have for the UI? 3:05I used Open WebUI. 3:06It's one of the many solutions that a person could use, but I found this to be extraordinarily helpful. 3:13Ok. 3:13It's easy to use. 3:14Yeah! So with Open WebUI, you can just open up a browser 3:18and then chat with the model, pick the model you want, and send requests to it. 3:21And there I was, and that's what I was working with for a long time right out of my home. 3:26But what if you're on the go? 3:29Well, that's where another container comes in. 3:32Okay, what have you got here? 3:34So it's a VPN container configured with my own domain. 3:39All right, so what can access this guy? 3:44This. 3:45Ah, ok, your phone. 3:47So now I am able to access my system from my phone or basically any internet connection. 3:57It's awesome. 3:58How very cool. 3:59All right, well, let's say that 4:01I wanted to actually replicate what 4:02you've done here and build it. 4:04I'm gonna ask you about this server itself. What are the system requirements? 4:08So let's start with RAM. How much RAM do I need for this? 4:11I would recommend at least 8 gigabytes. 4:148 gigabytes. 4:16That's not much. 4:17How much do you actually use? 4:17Well, I'm using 96 4:20OK, slightly above the minimum requirement. 4:22Absolutely. All right, so let's RAM. What about storage? 4:26Storage, I would recommend having at least one terabyte. 4:29OK, because some of these models can get pretty big. 4:32Yes, they can. 4:32Now, these models come in different sizes. So what's parameter count sizes we're using with Granite and Llama? 4:39I'm using anywhere between 7 and 4:4114 billion parameters. 4:437 to 14 billion, okay. 4:46I have run up to 70. 4:4870? 4:49How did that work out? 4:50Slow. 4:51I can imagine. 4:52OK. 4:53So the other thing that people 4:55often talk about in terms of system requirements are GPUs. 4:59So should I be using GPUs for this? 5:02Well. My initial configuration. 5:04I had no GPUs, 5:06but 5:08more GPUs the better 5:09The more, the better, right. 5:11So, we've got this self-contained 5:13solution now, and it's got me thinking that when I talk to a large language model, 5:17I often want to provide it documentation in order to chat with that document. 5:21Absolutely. 5:22Now, if I'm using a cloud-based model, I need 5:24to take my document and upload it to somebody else's server so that the AI model can see it. 5:29I take it that you have a better solution to that. 5:31I do. I use my own NAS system. 5:34Okay, so you have a NAS server setup. 5:37And from that NAS system, I pull in my documents, 5:40pull them into the open web UI, and chat away. And I'm doing it every single day. 5:46So Robert, the other thing I like 5:48about this architecture is at least 5:49to my mind, this looks like a really secure solution. 5:52Hold the phone there just a second, nice job AI guy, but let's really look at the security on this Robert. 5:58First of all, I think it is a good job here and I think you've put in some features that will help preserve security and privacy, 6:06but let's take a look at what some of those are because what you don't want is your data is our data, 6:13we want your data is your data, not your data is our business model. 6:17So how do we make sure that we're not falling into the same trap 6:21that a lot of those other chatbots that are free apps on the app store that you can download aren't falling into. 6:28Well, first off, I put it on my own hardware. 6:30Yeah, exactly. 6:32So I see that very clearly. It's on your hardware, so you control the infrastructure. 6:37You can decide when to turn the thing on and off. 6:39It's your data on your system. 6:41So that's the first point. 6:43Absolutely. 6:44Yeah, and then also it looks like that you included a private data store. 6:49So now it's not your information is training somebody else's model, 6:53and you're pulling information that might be poisoned or anything like that. 6:57You have some control over that as well. 6:59Yes, and interesting enough, that's what's actually got me started on this whole path. 7:03By having a NAS, I wanted my data to be my data. 7:07And data is the real core of an AI system anyway, so that makes a lot of sense. 7:11Also, I noticed some open source components. 7:14So you've got one right here, you've got open source models here as well. 7:18And that's a good idea, because instead of proprietary stuff, in these cases, at least we have an idea 7:25that the worldwide open source community has had a chance to look at this and vet it. 7:29Now granted, there's a lot of information to be vetted, so it's not trivial, no guarantees. 7:35Maybe it's a little more secure because more people have had a chance to look at what's actually happening under the covers. 7:41Agreed. 7:41And then also I notice you want to be able to access this from anywhere, 7:46which is one of the really cool aspects and we want to make sure that that access is also secured. 7:51So I see you put a VPN over 7:53here so that you can connect your phone in and do that securely. 7:57And how are you making sure everybody else in the world can't connect their phone in here as well? 8:02multi-factor. 8:04multi-factor authentication, and now we know it's really you, 8:08and we know the information is exchanged in a secure way. 8:11So a lot of features that you put in here, I think it's a nice job. 8:14Thank you. 8:15Yeah. And one other thing to think about, because these components, we really don't know what all of them would do, 8:21it is still possible that one of these things could be 8:25phoning home and sending data to the mothership, even without your knowledge. 8:30So one of the things that might be useful is put a network tap on your home network, 8:33and then that way you could see if there are any outbound connections from this, 8:37because there shouldn't be based upon the way you've built that. 8:40Well, that's a really great idea, 8:41Jeff. I'm going to have to look into that. 8:42Okay, there you go with the improvements for version two. 8:45Hey, Jeff. 8:46Oh, hey, Martin. Nice to have you back. 8:49Yeah, it seems like Robert's really done some nice work with this, don't you think? 8:52For sure. It just goes to show that you can now run sophisticated 8:56AI models on a home computer to build a personal chatbot. 9:00Yeah. Something like that would have been science fiction just a few short years ago, 9:05but now it's available to anyone who really wants to spend the time to assemble it all. 9:08Right. and you'd like. So much more about a technology by really digging into it and getting your hands dirty with it. 9:15Yeah, and by the looks of your hands, you've been doing a lot of digging because those things are filthy, 9:19and the added bonus is that you end up with a better assurance that your data is your data 9:25because you have more control and you can ensure that privacy is protected in the process. 9:29Spoken like a true security guy that you are, Jeff. 9:34All right, so you've seen Robert's approach. 9:37So how would you, dear viewer, do anything differently to make the system even better? 9:42Let us know in the comments.