AI Model Cards: Visual Cheat Sheet
Key Points
- Explaining AI model differences is notoriously hard because people struggle to attach meaning to arbitrary version numbers, so semantic, story‑like descriptors work much better.
- The speaker proposes turning the 16 top Hugging Face models into a printable card deck, giving each model a single-word tagline that captures its core strength for use in classrooms and casual conversations.
- Each card includes a concise “model card” worksheet designed for learners, turning technical specs into visual, memorable teaching tools.
- Example: Model 0.3 is labeled “Artificer” to convey its technically competent, problem‑solving, creation‑focused nature despite its “cold” output style.
- Example: The 200‑billion‑parameter Yi‑5 model is dubbed “Voyager” to highlight its fluency and bridge‑building between English and Chinese communications.
Sections
- AI Model Card Deck Concept - The speaker highlights the difficulty of explaining AI model differences and proposes a printable card deck that assigns each major Hugging Face model a one‑word summary, providing a visual, story‑based tool for better human understanding.
- Voyager’s Role and Grock Issues - The speaker explains Voyager’s multilingual, cross‑cultural purpose while clarifying it isn’t limited to code or poetry, praises Claude‑based Polymath for its versatility, and criticizes the undocumented “Grock” model for unconventional behavior and alignment problems.
- Nate's Preferred LLM Stack - He details how he allocates GPT‑3.5, GPT‑4, Claude Opus, and Gemini 2.5 Pro for different tasks, noting each model handles roughly 10‑70% of his queries.
Full Transcript
# AI Model Cards: Visual Cheat Sheet **Source:** [https://www.youtube.com/watch?v=7G0S7DSvKxU](https://www.youtube.com/watch?v=7G0S7DSvKxU) **Duration:** 00:09:03 ## Summary - Explaining AI model differences is notoriously hard because people struggle to attach meaning to arbitrary version numbers, so semantic, story‑like descriptors work much better. - The speaker proposes turning the 16 top Hugging Face models into a printable card deck, giving each model a single-word tagline that captures its core strength for use in classrooms and casual conversations. - Each card includes a concise “model card” worksheet designed for learners, turning technical specs into visual, memorable teaching tools. - Example: Model 0.3 is labeled “Artificer” to convey its technically competent, problem‑solving, creation‑focused nature despite its “cold” output style. - Example: The 200‑billion‑parameter Yi‑5 model is dubbed “Voyager” to highlight its fluency and bridge‑building between English and Chinese communications. ## Sections - [00:00:00](https://www.youtube.com/watch?v=7G0S7DSvKxU&t=0s) **AI Model Card Deck Concept** - The speaker highlights the difficulty of explaining AI model differences and proposes a printable card deck that assigns each major Hugging Face model a one‑word summary, providing a visual, story‑based tool for better human understanding. - [00:03:10](https://www.youtube.com/watch?v=7G0S7DSvKxU&t=190s) **Voyager’s Role and Grock Issues** - The speaker explains Voyager’s multilingual, cross‑cultural purpose while clarifying it isn’t limited to code or poetry, praises Claude‑based Polymath for its versatility, and criticizes the undocumented “Grock” model for unconventional behavior and alignment problems. - [00:06:18](https://www.youtube.com/watch?v=7G0S7DSvKxU&t=378s) **Nate's Preferred LLM Stack** - He details how he allocates GPT‑3.5, GPT‑4, Claude Opus, and Gemini 2.5 Pro for different tasks, noting each model handles roughly 10‑70% of his queries. ## Full Transcript
You know, one of the hardest things in
AI right now is explaining the
difference between models. And I have
really struggled with that because that
is one of my top requests that I get by
DM, by email, by sonic signal from the
aliens in the sky, whatever you want to
call it. I get a lot of asks for, you
know, why is 40 supposed to be dumber
than 03? And I get it. Naming
conventions are weird. I think I solved
it.
The key issue is actually the way humans
learn. We don't learn well by trying to
attach a piece of random meaning to a
text string. Like we don't use key
values that way if you're a developer.
Like that's just not how humans work
very well. We need something that gives
us semantic meaning. We're
storytellers. And model makers are so
busy making models, they're not giving
us the semantic meaning. And that you
know what that's great. They can make
great models. Fantastic. I am a geek and
I'm a board gamer and I'm a card gamer.
And I had an idea. Why not just turn all
of the major models, all 16 of the major
models on the Hugging Face leaderboards
right now into a card deck. Make them a
card deck that you can actually print,
that you can put into a classroom, that
you can give to your relatives if
they're not sure what these models do.
Make it
visual, and I think it's going to be
fun. Each card has a one-word summary of
what that model is best at. And if
you're like, "Oh my gosh, Nate, this is
a, you know, substack advertorial." It's
not because I'm actually going to give
you distinct value here. That's just for
you guys.
We are going to go through the key
models and including some of the ones I
don't talk about often here and why I
picked the word I did because you guys
are a more advanced audience and I think
you'll have fun with it and I think it
highlights the challenge of picking the
right word to describe something that is
as sort of nebulous as latent space and
how a model navigates it. Let me start
with 03 which I've talked about a ton.
I'll get to some of the rare models in a
minute. The Artificer is what I named
it, and I've wrestled with that.
Artificer is a weird word, right? It's
it's like a renfare word. Uh, but I like
it because it gets at this idea of being
technically competent, solving hard
problems, and focusing on creating
things, which is very much the vibe if
you're an 03. It's a little bit cold as
a model, but it's very good at problem
solving and creating things. Uh, and by
the way, in in the Substack, each of
these is like a printable worksheet, a
model card, like it's designed to go
into the classroom. It's designed for
learners. It's very fun. Uh, so let's
try a little bit of a rare one. Now, why
would I have picked this? I went with
the
Voyager for
Y.5 200 billion parameter model. Now,
give you a second. Why do you think I
chose Voyager?
If you're familiar with Yi, it is
specialized in fluency between English
and Chinese comments and
communications. And so to me, it felt
really naturally to it felt really
natural to have Voyager be a Voyager
between continents, between cultures,
and
connect. And I thought that was a great
way of summing up one of the things it's
really good at. Does that mean that
Voyager can never write code? Obviously
not. Does that mean that Voyager should
never write, you know, a poem or never
write an email for you? That's not the
point. You need a way to simplify so we
have semantic meaning. So we can
remember things. Claude for opus the
polymath. I think it's extraordinary
both reading critique. I'm getting
better at prompting it for writing and
it's really really good at code problem
solving. Polymath just felt right.
Um, here's one. Uh, I often get comments
uh underneath these YouTube videos.
Where's Grock? Nate, why don't you talk
about Grock? Well, part of why, by the
way, is they don't release a model card.
It is easier to do these when someone
releases a model card, and I really wish
the Groc would. That's a sort of
separate beef. I called Grock the
Maverick. Uh, I called out that it sort
of takes unconventional opinions. I
called out that it invents
unconventional uh ideas based on the ex
Twitter stream, etc., etc. And in the
caveats, I called out that there have
been some recent issues with
misalignment for that model. And each of
these, that's not just calling out
Grock. Every single one of these 16
cards, I call out something that is an
issue with that model because I don't
believe any model is perfect. I'm not,
you know, trying to take sides here.
Just calling out, you know, the the
balls and strikes like I see them, as as
they said in baseball. My grandpa was a
baseball fan.
I call out uh Perplexity, which almost
no one considers a model, but it's
actually it scores very well on the
leaderboards. People think of Perplexity
as just an LLM powered search engine,
but they built Sonar and Sonar is
designed for web search and so it
counts. And I love that I get a chance
to talk about these models that I don't
often talk about. I talk about uh Llama
345B. I talked about mixtrol 822 billion
collective. Do you know what that is?
They didn't name it collective. I named
it collective for semantic meaning
because it helps you remember that it
uses a model of experts to vote on
tokens which I think is super
interesting. And so it actually performs
pretty well given its 22 billion
parameterization. It's good for privacy.
But like you get an idea of how the
model works because I used the word
collective and because I drew a little
picture of like three people all
together around a model concept. You get
the idea. It's kind of half Magic the
Gathering card deck, half AI nery. And I
had a ton of fun with it. I have
exercises for the classroom. If you're
someone who actually wants to use it for
learning, it's like pre-built for that.
If you're someone who just wants to like
print out the cards and keep them by
your desk and build your stack, you can
also do that. And I wanted to share with
you guys, you know, you might wonder
like what is Nate's stack? What is what
is Nate using all the
time? To no one's surprise, uh 03 tops
the list for me. I would say it gets
probably 60 or 70% of my queries right
now. It's the daily driver. Uh, chat
GPT40 is something that I use pretty
frequently and that is I want to call it
10 15% of my chats where it's very
simple stuff like reword this, reformat
this, put this into markdown. Uh, it's
also a warmer model so I can sometimes
have companionable chats with it that 03
just is a little bit cold for. You don't
want tables when you're having a chat
about your day.
uh Claude for Opus I use when I'm trying
to build like a dashboard for my week or
when I'm trying to understand how to
structure uh a problem in coding. It's a
good response back and forth. It's an
excellent problem solver. I find it's
not quite as good at long context chats
and that seems to be a struggle with
claude
models. Uh but I would say I use that
one about 10 15% of the time as well.
And I I realize I'm running out of
percentage points because I did not plan
this in my head. Uh but uh more rarely I
will say uh I use Gemini 2.5 Pro as
verifiers and fact checkers. I find it's
really helpful for giving me a totally
alternate perspective that tends to be
pretty grounded. And so when I'm like I
don't trust Opus, I don't trust 03. I
need a second opinion. This is too
important. I reach for Gemini 2.5 Pro.
Um and to be honest, that's a habit
stack. like the memory hauling in JH GPT
is real and so just having it something
that remembers me is part of what drives
that split. It's not even necessarily
model
capability. And then last but not least,
deep research. When I want something
that like I can walk away, I can make a
cup of coffee, I can come back, I reach
for deep research and it tends to
produce very high quality results and
it's absolutely worth the 10 or 15
minutes. All right, there you go. I hope
you've enjoyed this. I hope you've
gotten some value out of kind of
thinking about the idea of semantic
meaning for models. Look, I don't care
if you want to go to my Substack or
throw up your hands and run away. That's
not the point. The point is we remember
things with semantic meaning. Model
makers have not learned that lesson and
I I needed something to teach people and
so I made this. If you want to say I
have a better one, artificer is
terrible. No one knows what that word
means. I'd be the first to agree with
you and I would say make one better.
make one better. Um, and let me know
about it. All right.