Two‑Line LLM Programming with Ollama
Key Points
- The quickest way to start programming against any LLM can be done in just two lines of code by running a locally installed model with Ollama.
- Install Ollama (available for macOS, Linux, and Windows), then pull and run the Granite 3.3 model using `ollama pull granite:3.3` and `ollama run granite:3.3`.
- To programmatically interact with the model, install the MIT‑licensed *chuk‑llm* library via the `uv` package manager (`brew install uv` on macOS, then `uvx chuk-llm test ollama`).
- The library provides a simple shortcut function (e.g., `ask_granite`) that lets you query the model with a single command like `uvx chuk-llm ask_granite "Who is Ada Lovelace?"`.
- This workflow enables rapid, local LLM development without vendor lock‑in, using only a few terminal commands.
Sections
- Untitled Section
- Dynamic Model Calls and Streaming Options - The speaker demonstrates how a runtime‑generated function automatically connects to the Ollama Granite model (or any other model/provider) for non‑streaming responses, and hints at adding streaming capability later.
- Async Chat Interaction Example - The speaker shows how to create a chat variable, import the conversation module, send an initial question with `await chat.ask`, print the response, then issue a follow‑up question using the same async call.
Full Transcript
# Two‑Line LLM Programming with Ollama **Source:** [https://www.youtube.com/watch?v=_1uFtDfqapo](https://www.youtube.com/watch?v=_1uFtDfqapo) **Duration:** 00:13:57 ## Summary - The quickest way to start programming against any LLM can be done in just two lines of code by running a locally installed model with Ollama. - Install Ollama (available for macOS, Linux, and Windows), then pull and run the Granite 3.3 model using `ollama pull granite:3.3` and `ollama run granite:3.3`. - To programmatically interact with the model, install the MIT‑licensed *chuk‑llm* library via the `uv` package manager (`brew install uv` on macOS, then `uvx chuk-llm test ollama`). - The library provides a simple shortcut function (e.g., `ask_granite`) that lets you query the model with a single command like `uvx chuk-llm ask_granite "Who is Ada Lovelace?"`. - This workflow enables rapid, local LLM development without vendor lock‑in, using only a few terminal commands. ## Sections - [00:00:00](https://www.youtube.com/watch?v=_1uFtDfqapo&t=0s) **Untitled Section** - - [00:04:47](https://www.youtube.com/watch?v=_1uFtDfqapo&t=287s) **Dynamic Model Calls and Streaming Options** - The speaker demonstrates how a runtime‑generated function automatically connects to the Ollama Granite model (or any other model/provider) for non‑streaming responses, and hints at adding streaming capability later. - [00:09:17](https://www.youtube.com/watch?v=_1uFtDfqapo&t=557s) **Async Chat Interaction Example** - The speaker shows how to create a chat variable, import the conversation module, send an initial question with `await chat.ask`, print the response, then issue a follow‑up question using the same async call. ## Full Transcript
So I was recently challenged by one of my colleagues on what is the quickest way to get
someone started programing against a large language model. And I said, I think we could do it
in two lines of code, and that is what we're going to do today. So this will work against any LLM
provider. But for easiness sake, I want you to be able to run this locally on your machine. And to
do that we're going to use a tool called Ollama. So you just need to go to ollama.com, and then you're
going to click on the download button. And then you can download it for Mac, Linux, w ... Windows. Now
once you've downloaded and installed Ollama, you can just bring up a terminal and you can see it
works by typing in Ollama. Now, to download a model from Ollama, they have a
library, so you can go to ollama.com/library and that will show you all the models that are
available uh, for you to download. Now we're going to use the Granite 3.3 model today. So I'm going to
type in all "Ollama pull Granite 3.3". And you're going to see that downloads that onto my local
machine. It may take a few minutes cause it's quite a uh, large model, but once that's downloaded,
you can run it locally and you can do that by just typing in "Ollama run Granite 3.3", which is
the model name, and then just whatever your query is. So I am just going to say "Hi" and then you're
going to see "Hello, how can I assist you?". It is that quick to get started. So now that we have the
Granite model downloaded and working, we want to be able to program against it. And to do that, I
created an MIT-style library that makes it super easy to program within two lines of code. But we
need to be able to install that library. So to do that, we are going to use a package management
tool called UV, and super simple to get installed. You just need to go to astral.sh. And then you go
to the docs and there's an installation instructions. Very very easy to do. If you are
running a Mac, you can just type in "brew install uv" and it will install that for you. Now as you
can see that is downloaded, so I'm just going to type in "uvx chuk-llm" uh,
"test ollama". And that is just going to check that Ollama is working and we have the uh,
chuk LLM library running. And there you go, you can see it is passed its test and we are able to work with it.
So now we want to interact with the model. So I'm going to ask it a couple of questions. So I'm
going to do "uvx chuk-llm ask_granite". So this is the nice thing
about uh, this uh, package is that you can just put ask underscore and then whatever the model
name is. So it could be Granite 3.3, it could be GPT or SS. Uh, it can be whatever is installed
locally on your machine. So in this case, there is a shortcut for Granite, I'm just going to say
"ask_granite". And I can say something like "Who is Ada Lovelace?" And then
that will go off to a Ollama and ask the Granite model what the answer is. And you see, it's doing
that sort of nice streaming effect there, because it's a large language model that uses next token
prediction, and it's just going token by token by token. So that's great, but it's not
programming. And I promised you that we were going to be able to code this in two lines of code. So
to do that, I'm going to type in "uv init", and that is going to create me a new, uh,
Python project. And I'm going to open up Visual Studio Code, so I'm just going to do code dot and
that will open up uh, the initialize project. can see there that there's a main file that's
automatically being created. So if I were to type in "uv run main.py", you're going to see it
comes back with "Hello" from chuk LLM demo. So now that our hello world works, I need to install
the chuk LLM package into our Python project. To do that, I'm going to type in "uv add" and I'm going to put
in "chuk-llm". And that is going to add chuk LLM into uh, my project and into my virtual
environment and I'm going to be able to use that. So now that that's installed, we can come back
into VS Code. And I need to be able to import the ask Ollama Granite function
into my code. To do that, I'm going to type in "from chuk_llm". So that is the package we
just installed. And by the way, if you want to see that it's installed, you can see in my pyproject.toml
and the dependencies chuk llm is now there. So now that we know the package is installed, we can
go back to our main dotpy, we can get rid of the hello world. And what we want to do is import that
same ask question we did from the terminal. So I want to import a function called "ask_
ollama_granite". So to do that, I just type in "from chuk_llm import" and then
I'm just going to put in "ask _ ollama_granite". So that's my first line of
code. And then the second line of code is going to be "print ask_ollama_granite".
And then we're going to say "Who is Ada Lovelace?" And now if I come back to my terminal
and I run "uv run.py" one more time, you're going to see it's not going to stream at the
moment, it's going to take a few seconds to come back. It's going to go off to Ollama, call the
Granite model, and it's going to come back with uh, who Ada Lovelace is. Notice it's not doing that
streaming thing at the moment; it's just returning everything in one go. So as I promised, two lines of
code to be able to talk to the model, and actually what's happening underneath the hood there is
this "ask_ollama_granite" is actually generated at runtime. So the first
time I run this, it will actually go to Ollama, discover what models are on your machine, and it
will automatically generate the function for you so that you can use that in the LLM. So you don't
need to understand the complexity around that. It will just generate the library. That equally means
that if I've got any other model in there, then I can just get rid of the Granite part and I can
put the model name in there, and then that's gonna work for you. And you can speak to any model.
Similarly, uh, if I want to speak to a uh, different model provider, I can just replace the Ollama part with
watsonx, for example, and that command will work against any models running on watsonx.
Now you're probably thinking to yourself, I want that kind of streaming effect that you showed me
on the terminal. So to do that, we are gonna uh, have to work with a library called asyncio. Um, again,
that's automatically installed when you add uh, chuk llm. So I can just import asyncio. And in there,
there is another function called "stream_ollama_granite". And again,
works in the same way—rather than using ask, it can use stream. And now what I need to be able to
do is work asynchronously. Because what's happening when I go token by token and streaming
it out, then, uh, it's basically an asynchronous call and there's a generator behind that. So to do
that, I just need a function. And I can do that by going "async def" and we'll call it "stream
_example". Um, and then I will just comment this out for a second and I will do an
"async for chunk in stream ollama granite". And then I'm
going to write "Who is Ada Lovelace?" one more time, and then I just need to print out each chunk as
it's generated. So we'll say "print chunk" and "is equal to" uh, uh our empty string, and then
I'm just going to flush it out. And now all I need to do is call my main function, do an "asyncio.run",
and then pass in the function call that I need to call "stream example". And now if we save
that and we are now running asynchronously and we're generating it out token by token. Now one of
the fun things to do with LLMs is to be able to take on a persona or give it some sort of special
instruction. So now in this case, it is known as a system prompt. Now a system prompt is always part
of a conversation, uh, and it takes a higher precedence over the user prompt. So, if you want to
be able to say to the model, I want you always act like a pirate, or I want you to act like as a
reviewer or whatever, in order for the model to not lose the context, especially as it gets, uh, the
conversation gets longer, then one of the things you can do is uh, give those instructions in this
special instruction called a system prompt. So, to set a persona, I'm going to set my persona as
"equal to you are a pirate called jolly roger and you always speak in pirate speak". And then if I
want to assign that persona to the model, I can just go into my "stream ollama granite" function,
and then I can just set the parameter "system prompt is equal to persona". So, if I
save that and ask uh, who is the Lovelace again, we're now going to find out about Ada Lovelace, but in a
pirate persona. And you can see there it's all pirate speak, etc. of course I can have more
serious examples such as you are a reviewer or in the case of agents, which we'll cover uh, in another
video. You can then start to s ... tell what tools it has access to. Now of course, we've just been doing
single prompt question and answer, but what if we want to have multi-turn conversations? Well,
actually we can use uh, a conversation uh, function that is part of this framework. So I'm going to
type in "async with conversation". Now, in this case I am rather than doing the ask model, I'm going to
set the "provider is equal to ollama". So rather than using those sort of pregenerated functions. And
then similarly I'm going to set the "model is equal to" uh, "granite 3.3".
And then I'm just going to give this uh, you know, a variable called chat to make it a little bit
easier to work with. And again, I'm going to add conversation as part of my imports above. Now
that I've done that, I can just ask questions back and forward. So in this case, I'm just going to
type in "My name is Chris and I'm learning Python." Um, we'll print out the question just so we know uh,
what we said there. And then, the next thing I want to do is I want to pass that across to the model.
So I'm going to say "response is equal to"—I'm going to do an "await chat.ask" because again
I'm remaining within uh, an asynchronous function at the moment. I'm going to pass in my question, which
of course was "My name is Chris and I am learning Python". I'm going to get a response back from the
model, um, and I'll just print out what that response is. So we'll say "print response". Now
I want to ask a follow-up question. And that follow-up is going to be "what am I learning?" And
now what we'll do is uh, ask the model again. We'll just save it in response one more time. I'm going
to "await chat.ask" and this time I'm going to pass in "followup". Notice that I am just using
chat.ask and I'm just asking a different question; everything else remains the same. I'm
going to overwrite uh, the previous answer and again I'm going to print out that response. So now if I
run this in my terminal, you see I've given it "My name is Chris and I'm learning Python". The model
is coming back saying "That is great, Python is fantastic." And then here's the follow-up: "You're
learning Python programing language, blah blah blah blah." So it's remembered the conversation. And
I didn't need to think about that; it was all sort of tidied up within that conversation loop. Now if
you're thinking to yourself, yeah-yeah, this is super simple, but actually what when I'm dealing with
the more low-level libraries, how does that sort of look there? So to do that I can just do a
uh "client is equal to get client". Uh, I'm going to pass in "my provider is equal to ollama". As I did
before with the conversation, I'm going to set the "model is equal to" and we'll say "granite
3.3 latest" because that's what it's called in Ollama. And again, I'm going to put "get
client" uh, in my imports. We'll get rid of all of this stuff up here because we don't need that. And
then, if I go to uh, messages and I'm going to set that "equal to an
array". So within the messages array I'm just going to basically have a list of uh, JSON. So the first
one is going to be a "role". And then in this case, the role is uh, "system". So,
remember that system prompt that we have you are a pirate etc.? So if I want to, I can set the
"content". So I'll just have an attribute called content and I'll say uh, "you are a
pirate and you always speak in pirate speak." And then we'll follow up with my uh,
user question. So we'll say my role in this case is going to be at user, and the content in this
case is going to be um, "Who is Ada Lovelace?" So as you see, that
message structure is slightly different, but this is the same structure that you'll deal with when
you're working with most, uh, LLM libraries. This idea of messages, you state what the role is going
to be and then what your content is. So in this case this is a system prompt, this is a user
prompt. Um, and again if you want to have multitenant conversations, you would just keep
adding to that list. So it would be just, you know, you might have the response from the assistant,
role assistant. And then you can just build up the conversation that way. So now I want to ask the
model a question. So that is known as a completion. So I'm going to say "completion is equal to", and
again, I'm going to do an await in this case because it's an asynchronous uh, function. And I'm
going to do "client dot" uh, "create underscore completion". And then all I'm going to do is set
"messages is equal to messages" over here. And then I will print out the response which is going to
be "completion", and then we'll say "response". So if I save that, I come back to my terminal and we
should have a pirate speaking Ada Lovelace uh, one more time. And there you go, I, Ada Lovelace, blah
blah blah blah. So this is the exact same thing as you saw a little bit earlier. It's a little bit
more complicated, it's a little bit more of a low-level a ... API. But this is what you will typically see
when you're working with large language models and other libraries. Um, but as you see, the simple
sort of ask structure, stream structure that I had earlier sort of takes away that complexity and
helps you there. So now that you know how to work with a large language model very quickly, you can
go off and build other things. then in the future video, I'm going to show you how you can
then start working with things like agents.