Learning Library

← Back to Library

Two‑Line LLM Programming with Ollama

13m • Unknown Channel • ai-ml • tutorial • beginner • Watch on YouTube ↗

Key Points

The quickest way to start programming against any LLM can be done in just two lines of code by running a locally installed model with Ollama.
Install Ollama (available for macOS, Linux, and Windows), then pull and run the Granite 3.3 model using `ollama pull granite:3.3` and `ollama run granite:3.3`.
To programmatically interact with the model, install the MIT‑licensed *chuk‑llm* library via the `uv` package manager (`brew install uv` on macOS, then `uvx chuk-llm test ollama`).
The library provides a simple shortcut function (e.g., `ask_granite`) that lets you query the model with a single command like `uvx chuk-llm ask_granite "Who is Ada Lovelace?"`.
This workflow enables rapid, local LLM development without vendor lock‑in, using only a few terminal commands.

Sections

Full Transcript

# Two‑Line LLM Programming with Ollama **Source:** [https://www.youtube.com/watch?v=_1uFtDfqapo](https://www.youtube.com/watch?v=_1uFtDfqapo) **Duration:** 00:13:57 ## Summary - The quickest way to start programming against any LLM can be done in just two lines of code by running a locally installed model with Ollama. - Install Ollama (available for macOS, Linux, and Windows), then pull and run the Granite 3.3 model using `ollama pull granite:3.3` and `ollama run granite:3.3`. - To programmatically interact with the model, install the MIT‑licensed *chuk‑llm* library via the `uv` package manager (`brew install uv` on macOS, then `uvx chuk-llm test ollama`). - The library provides a simple shortcut function (e.g., `ask_granite`) that lets you query the model with a single command like `uvx chuk-llm ask_granite "Who is Ada Lovelace?"`. - This workflow enables rapid, local LLM development without vendor lock‑in, using only a few terminal commands. ## Sections - [00:00:00](https://www.youtube.com/watch?v=_1uFtDfqapo&t=0s) **Untitled Section** - - [00:04:47](https://www.youtube.com/watch?v=_1uFtDfqapo&t=287s) **Dynamic Model Calls and Streaming Options** - The speaker demonstrates how a runtime‑generated function automatically connects to the Ollama Granite model (or any other model/provider) for non‑streaming responses, and hints at adding streaming capability later. - [00:09:17](https://www.youtube.com/watch?v=_1uFtDfqapo&t=557s) **Async Chat Interaction Example** - The speaker shows how to create a chat variable, import the conversation module, send an initial question with `await chat.ask`, print the response, then issue a follow‑up question using the same async call. ## Full Transcript

0:00So I was recently challenged by one of my colleagues on what is the quickest way to get 0:04someone started programing against a large language model. And I said, I think we could do it 0:09in two lines of code, and that is what we're going to do today. So this will work against any LLM 0:14provider. But for easiness sake, I want you to be able to run this locally on your machine. And to 0:19do that we're going to use a tool called Ollama. So you just need to go to ollama.com, and then you're 0:23going to click on the download button. And then you can download it for Mac, Linux, w ... Windows. Now 0:27once you've downloaded and installed Ollama, you can just bring up a terminal and you can see it 0:31works by typing in Ollama. Now, to download a model from Ollama, they have a 0:38library, so you can go to ollama.com/library and that will show you all the models that are 0:44available uh, for you to download. Now we're going to use the Granite 3.3 model today. So I'm going to 0:49type in all "Ollama pull Granite 3.3". And you're going to see that downloads that onto my local 0:56machine. It may take a few minutes cause it's quite a uh, large model, but once that's downloaded, 1:01you can run it locally and you can do that by just typing in "Ollama run Granite 3.3", which is 1:07the model name, and then just whatever your query is. So I am just going to say "Hi" and then you're 1:13going to see "Hello, how can I assist you?". It is that quick to get started. So now that we have the 1:18Granite model downloaded and working, we want to be able to program against it. And to do that, I 1:24created an MIT-style library that makes it super easy to program within two lines of code. But we 1:30need to be able to install that library. So to do that, we are going to use a package management 1:34tool called UV, and super simple to get installed. You just need to go to astral.sh. And then you go 1:41to the docs and there's an installation instructions. Very very easy to do. If you are 1:46running a Mac, you can just type in "brew install uv" and it will install that for you. Now as you 1:52can see that is downloaded, so I'm just going to type in "uvx chuk-llm" uh, 1:58"test ollama". And that is just going to check that Ollama is working and we have the uh, 2:06chuk LLM library running. And there you go, you can see it is passed its test and we are able to work with it. 2:11So now we want to interact with the model. So I'm going to ask it a couple of questions. So I'm 2:14going to do "uvx chuk-llm ask_granite". So this is the nice thing 2:21about uh, this uh, package is that you can just put ask underscore and then whatever the model 2:28name is. So it could be Granite 3.3, it could be GPT or SS. Uh, it can be whatever is installed 2:34locally on your machine. So in this case, there is a shortcut for Granite, I'm just going to say 2:39"ask_granite". And I can say something like "Who is Ada Lovelace?" And then 2:46that will go off to a Ollama and ask the Granite model what the answer is. And you see, it's doing 2:51that sort of nice streaming effect there, because it's a large language model that uses next token 2:57prediction, and it's just going token by token by token. So that's great, but it's not 3:01programming. And I promised you that we were going to be able to code this in two lines of code. So 3:05to do that, I'm going to type in "uv init", and that is going to create me a new, uh, 3:12Python project. And I'm going to open up Visual Studio Code, so I'm just going to do code dot and 3:17that will open up uh, the initialize project. can see there that there's a main file that's 3:22automatically being created. So if I were to type in "uv run main.py", you're going to see it 3:29comes back with "Hello" from chuk LLM demo. So now that our hello world works, I need to install 3:34the chuk LLM package into our Python project. To do that, I'm going to type in "uv add" and I'm going to put 3:40in "chuk-llm". And that is going to add chuk LLM into uh, my project and into my virtual 3:47environment and I'm going to be able to use that. So now that that's installed, we can come back 3:51into VS Code. And I need to be able to import the ask Ollama Granite function 3:58into my code. To do that, I'm going to type in "from chuk_llm". So that is the package we 4:04just installed. And by the way, if you want to see that it's installed, you can see in my pyproject.toml 4:09and the dependencies chuk llm is now there. So now that we know the package is installed, we can 4:14go back to our main dotpy, we can get rid of the hello world. And what we want to do is import that 4:21same ask question we did from the terminal. So I want to import a function called "ask_ 4:27ollama_granite". So to do that, I just type in "from chuk_llm import" and then 4:34I'm just going to put in "ask _ ollama_granite". So that's my first line of 4:40code. And then the second line of code is going to be "print ask_ollama_granite". 4:47And then we're going to say "Who is Ada Lovelace?" And now if I come back to my terminal 4:53and I run "uv run.py" one more time, you're going to see it's not going to stream at the 4:57moment, it's going to take a few seconds to come back. It's going to go off to Ollama, call the 5:01Granite model, and it's going to come back with uh, who Ada Lovelace is. Notice it's not doing that 5:05streaming thing at the moment; it's just returning everything in one go. So as I promised, two lines of 5:11code to be able to talk to the model, and actually what's happening underneath the hood there is 5:16this "ask_ollama_granite" is actually generated at runtime. So the first 5:22time I run this, it will actually go to Ollama, discover what models are on your machine, and it 5:27will automatically generate the function for you so that you can use that in the LLM. So you don't 5:32need to understand the complexity around that. It will just generate the library. That equally means 5:37that if I've got any other model in there, then I can just get rid of the Granite part and I can 5:43put the model name in there, and then that's gonna work for you. And you can speak to any model. 5:48Similarly, uh, if I want to speak to a uh, different model provider, I can just replace the Ollama part with 5:54watsonx, for example, and that command will work against any models running on watsonx. 6:00Now you're probably thinking to yourself, I want that kind of streaming effect that you showed me 6:04on the terminal. So to do that, we are gonna uh, have to work with a library called asyncio. Um, again, 6:11that's automatically installed when you add uh, chuk llm. So I can just import asyncio. And in there, 6:18there is another function called "stream_ollama_granite". And again, 6:24works in the same way—rather than using ask, it can use stream. And now what I need to be able to 6:29do is work asynchronously. Because what's happening when I go token by token and streaming 6:35it out, then, uh, it's basically an asynchronous call and there's a generator behind that. So to do 6:41that, I just need a function. And I can do that by going "async def" and we'll call it "stream 6:46_example". Um, and then I will just comment this out for a second and I will do an 6:52"async for chunk in stream ollama granite". And then I'm 6:59going to write "Who is Ada Lovelace?" one more time, and then I just need to print out each chunk as 7:06it's generated. So we'll say "print chunk" and "is equal to" uh, uh our empty string, and then 7:13I'm just going to flush it out. And now all I need to do is call my main function, do an "asyncio.run", 7:19and then pass in the function call that I need to call "stream example". And now if we save 7:25that and we are now running asynchronously and we're generating it out token by token. Now one of 7:31the fun things to do with LLMs is to be able to take on a persona or give it some sort of special 7:36instruction. So now in this case, it is known as a system prompt. Now a system prompt is always part 7:41of a conversation, uh, and it takes a higher precedence over the user prompt. So, if you want to 7:47be able to say to the model, I want you always act like a pirate, or I want you to act like as a 7:51reviewer or whatever, in order for the model to not lose the context, especially as it gets, uh, the 7:57conversation gets longer, then one of the things you can do is uh, give those instructions in this 8:02special instruction called a system prompt. So, to set a persona, I'm going to set my persona as 8:06"equal to you are a pirate called jolly roger and you always speak in pirate speak". And then if I 8:11want to assign that persona to the model, I can just go into my "stream ollama granite" function, 8:16and then I can just set the parameter "system prompt is equal to persona". So, if I 8:23save that and ask uh, who is the Lovelace again, we're now going to find out about Ada Lovelace, but in a 8:30pirate persona. And you can see there it's all pirate speak, etc. of course I can have more 8:35serious examples such as you are a reviewer or in the case of agents, which we'll cover uh, in another 8:41video. You can then start to s ... tell what tools it has access to. Now of course, we've just been doing 8:47single prompt question and answer, but what if we want to have multi-turn conversations? Well, 8:52actually we can use uh, a conversation uh, function that is part of this framework. So I'm going to 8:58type in "async with conversation". Now, in this case I am rather than doing the ask model, I'm going to 9:04set the "provider is equal to ollama". So rather than using those sort of pregenerated functions. And 9:10then similarly I'm going to set the "model is equal to" uh, "granite 3.3". 9:17And then I'm just going to give this uh, you know, a variable called chat to make it a little bit 9:22easier to work with. And again, I'm going to add conversation as part of my imports above. Now 9:29that I've done that, I can just ask questions back and forward. So in this case, I'm just going to 9:33type in "My name is Chris and I'm learning Python." Um, we'll print out the question just so we know uh, 9:40what we said there. And then, the next thing I want to do is I want to pass that across to the model. 9:46So I'm going to say "response is equal to"—I'm going to do an "await chat.ask" because again 9:51I'm remaining within uh, an asynchronous function at the moment. I'm going to pass in my question, which 9:57of course was "My name is Chris and I am learning Python". I'm going to get a response back from the 10:03model, um, and I'll just print out what that response is. So we'll say "print response". Now 10:09I want to ask a follow-up question. And that follow-up is going to be "what am I learning?" And 10:16now what we'll do is uh, ask the model again. We'll just save it in response one more time. I'm going 10:20to "await chat.ask" and this time I'm going to pass in "followup". Notice that I am just using 10:27chat.ask and I'm just asking a different question; everything else remains the same. I'm 10:31going to overwrite uh, the previous answer and again I'm going to print out that response. So now if I 10:37run this in my terminal, you see I've given it "My name is Chris and I'm learning Python". The model 10:41is coming back saying "That is great, Python is fantastic." And then here's the follow-up: "You're 10:45learning Python programing language, blah blah blah blah." So it's remembered the conversation. And 10:50I didn't need to think about that; it was all sort of tidied up within that conversation loop. Now if 10:55you're thinking to yourself, yeah-yeah, this is super simple, but actually what when I'm dealing with 10:59the more low-level libraries, how does that sort of look there? So to do that I can just do a 11:05uh "client is equal to get client". Uh, I'm going to pass in "my provider is equal to ollama". As I did 11:12before with the conversation, I'm going to set the "model is equal to" and we'll say "granite 11:183.3 latest" because that's what it's called in Ollama. And again, I'm going to put "get 11:25client" uh, in my imports. We'll get rid of all of this stuff up here because we don't need that. And 11:32then, if I go to uh, messages and I'm going to set that "equal to an 11:39array". So within the messages array I'm just going to basically have a list of uh, JSON. So the first 11:45one is going to be a "role". And then in this case, the role is uh, "system". So, 11:52remember that system prompt that we have you are a pirate etc.? So if I want to, I can set the 11:58"content". So I'll just have an attribute called content and I'll say uh, "you are a 12:04pirate and you always speak in pirate speak." And then we'll follow up with my uh, 12:12user question. So we'll say my role in this case is going to be at user, and the content in this 12:18case is going to be um, "Who is Ada Lovelace?" So as you see, that 12:25message structure is slightly different, but this is the same structure that you'll deal with when 12:29you're working with most, uh, LLM libraries. This idea of messages, you state what the role is going 12:36to be and then what your content is. So in this case this is a system prompt, this is a user 12:40prompt. Um, and again if you want to have multitenant conversations, you would just keep 12:44adding to that list. So it would be just, you know, you might have the response from the assistant, 12:48role assistant. And then you can just build up the conversation that way. So now I want to ask the 12:53model a question. So that is known as a completion. So I'm going to say "completion is equal to", and 12:58again, I'm going to do an await in this case because it's an asynchronous uh, function. And I'm 13:03going to do "client dot" uh, "create underscore completion". And then all I'm going to do is set 13:08"messages is equal to messages" over here. And then I will print out the response which is going to 13:14be "completion", and then we'll say "response". So if I save that, I come back to my terminal and we 13:21should have a pirate speaking Ada Lovelace uh, one more time. And there you go, I, Ada Lovelace, blah 13:26blah blah blah. So this is the exact same thing as you saw a little bit earlier. It's a little bit 13:31more complicated, it's a little bit more of a low-level a ... API. But this is what you will typically see 13:37when you're working with large language models and other libraries. Um, but as you see, the simple 13:42sort of ask structure, stream structure that I had earlier sort of takes away that complexity and 13:47helps you there. So now that you know how to work with a large language model very quickly, you can 13:51go off and build other things. then in the future video, I'm going to show you how you can 13:55then start working with things like agents.