Nvidia DGX Spark vs Dual‑4090 Server
Key Points
- Nvidia sent the presenter a handheld AI supercomputer called the DGX Spark, featuring a Grace Blackwell 20‑core ARM CPU, a Blackwell GPU with 1 pedlop of AI compute, 128 GB unified DDR5X memory, and a $4K price tag.
- The creator hoped the Spark would outperform his existing dual‑RTX 4090 AI server (“Terry”) and ran benchmark tests using models like Quinn 38B and Llama 3.3 70B.
- In both tests, Terry dramatically outpaced the Spark—132 tokens/sec versus 36 tokens/sec for the 38B model—demonstrating that the small device still lags behind a high‑end desktop setup.
- After confronting Nvidia about the unexpected results, the company acknowledged that a dual‑4090 rig would naturally beat the Spark on many workloads, underscoring the Spark’s niche as an affordable, portable AI server rather than a wholesale replacement for larger GPU clusters.
Sections
- Handheld AI Supercomputer Unboxing - The video unboxes Nvidia's $4K DGX Spark—a palm‑sized AI supercomputer featuring a Grace Blackwell chip, 128 GB unified memory, and the ability to run up to 200‑billion‑parameter models—as the creator tests whether it can outperform his existing AI server.
- Backpack AI vs Desktop GPU - The presenter demos Nvidia’s Comfy UI image generation on two systems—Terry, a compact, high‑performance device, and Larry, a regular desktop—highlighting Terry’s dramatically faster iteration speed despite its tiny, backpack‑sized form factor.
- Training Speed & Memory Comparison - The speaker contrasts Terry and Larry’s training performance, showing Terry’s three‑times faster iterations but limited VRAM for larger models, and argues the device excels more at inference than training for AI developers.
- TwinGate Secure Cloud Networking Overview - The speaker demonstrates how to quickly set up a TwinGate network and connector for seamless, VPN‑like access across devices, highlighting its ease of use, enterprise‑grade security, and free availability while also touching on the high cost of renting cloud GPUs for training models.
- Speculative Decoding Demo - The speaker demonstrates speculative decoding, where a fast small model drafts tokens and a larger model verifies them, highlighting the reduced latency, high VRAM requirements (≈77 GB), and speed gains observed with a 70‑billion‑parameter model.
- Evaluating NVIDIA's AI Mini‑Supercomputer - The speaker assesses NVIDIA’s Sync and Spark hardware, praising its easy‑to‑use, Apple‑like experience and high‑speed GPU‑to‑GPU connectivity, while questioning if it truly merits the “supercomputer” label or a purchase.
- Creator Introduces Prayer Segment - The speaker announces a new habit of ending videos with a personal prayer for the audience, explaining the motivation behind it while acknowledging viewers’ diverse beliefs.
Full Transcript
# Nvidia DGX Spark vs Dual‑4090 Server **Source:** [https://www.youtube.com/watch?v=FYL9e_aqZY0](https://www.youtube.com/watch?v=FYL9e_aqZY0) **Duration:** 00:23:59 ## Summary - Nvidia sent the presenter a handheld AI supercomputer called the DGX Spark, featuring a Grace Blackwell 20‑core ARM CPU, a Blackwell GPU with 1 pedlop of AI compute, 128 GB unified DDR5X memory, and a $4K price tag. - The creator hoped the Spark would outperform his existing dual‑RTX 4090 AI server (“Terry”) and ran benchmark tests using models like Quinn 38B and Llama 3.3 70B. - In both tests, Terry dramatically outpaced the Spark—132 tokens/sec versus 36 tokens/sec for the 38B model—demonstrating that the small device still lags behind a high‑end desktop setup. - After confronting Nvidia about the unexpected results, the company acknowledged that a dual‑4090 rig would naturally beat the Spark on many workloads, underscoring the Spark’s niche as an affordable, portable AI server rather than a wholesale replacement for larger GPU clusters. ## Sections - [00:00:00](https://www.youtube.com/watch?v=FYL9e_aqZY0&t=0s) **Handheld AI Supercomputer Unboxing** - The video unboxes Nvidia's $4K DGX Spark—a palm‑sized AI supercomputer featuring a Grace Blackwell chip, 128 GB unified memory, and the ability to run up to 200‑billion‑parameter models—as the creator tests whether it can outperform his existing AI server. - [00:04:55](https://www.youtube.com/watch?v=FYL9e_aqZY0&t=295s) **Backpack AI vs Desktop GPU** - The presenter demos Nvidia’s Comfy UI image generation on two systems—Terry, a compact, high‑performance device, and Larry, a regular desktop—highlighting Terry’s dramatically faster iteration speed despite its tiny, backpack‑sized form factor. - [00:08:01](https://www.youtube.com/watch?v=FYL9e_aqZY0&t=481s) **Training Speed & Memory Comparison** - The speaker contrasts Terry and Larry’s training performance, showing Terry’s three‑times faster iterations but limited VRAM for larger models, and argues the device excels more at inference than training for AI developers. - [00:11:09](https://www.youtube.com/watch?v=FYL9e_aqZY0&t=669s) **TwinGate Secure Cloud Networking Overview** - The speaker demonstrates how to quickly set up a TwinGate network and connector for seamless, VPN‑like access across devices, highlighting its ease of use, enterprise‑grade security, and free availability while also touching on the high cost of renting cloud GPUs for training models. - [00:14:15](https://www.youtube.com/watch?v=FYL9e_aqZY0&t=855s) **Speculative Decoding Demo** - The speaker demonstrates speculative decoding, where a fast small model drafts tokens and a larger model verifies them, highlighting the reduced latency, high VRAM requirements (≈77 GB), and speed gains observed with a 70‑billion‑parameter model. - [00:17:42](https://www.youtube.com/watch?v=FYL9e_aqZY0&t=1062s) **Evaluating NVIDIA's AI Mini‑Supercomputer** - The speaker assesses NVIDIA’s Sync and Spark hardware, praising its easy‑to‑use, Apple‑like experience and high‑speed GPU‑to‑GPU connectivity, while questioning if it truly merits the “supercomputer” label or a purchase. - [00:21:18](https://www.youtube.com/watch?v=FYL9e_aqZY0&t=1278s) **Creator Introduces Prayer Segment** - The speaker announces a new habit of ending videos with a personal prayer for the audience, explaining the motivation behind it while acknowledging viewers’ diverse beliefs. ## Full Transcript
Nvidia sent me this and I can finally
talk about it. This AI supercomputer
fits in the palm of my hand and it runs
AI models my dual 4D90s can't. This is a
whole new category of device, an AI
server you can actually afford. Now, I
think this might be the device we've
been waiting for. Powerful local AI that
doesn't suck. I'm excited to try this
and show it to you cuz it might change
everything. So, in this video, we're
diving into the specs, seeing if it can
defeat and replace my AI server, Terry,
and discover what it can actually do
with real tools like N Comfy UI, open
web UI. Get your coffee ready. Let's go.
So, here's what Nvidia sent me. An
intense looking box. And inside is the
NVIDIA DGX Spark. Dude, I'm holding an
AI supercomput in my hand. And here's
what's kind of crazy. This is the
original DGX1, the server that
kickstarted the AI revolution. Jensen
delivered this server to Sam Alman to
get chat GBT started. Now look at this.
Compared to Spark, which is not much
bigger than my coffee cup or phone. Look
how far we've come. Okay, cool. It's
small. What are the specs? What's it
packing? For the brains, we have a GB10
Grace Blackwell superchip, a 20 core ARM
processor. It has a Blackwell GPU with
one pedal flop of AI compute. One
pedlop. But the memory, it has 128 GB of
unified memory. LP DDR5X. It's got a 10
gig Ethernet port. And then this fun
rectangle. We'll talk more about that
later. This can run up to 200 billion
parameter models. The cost, it's about
4K. Is it worth it? We'll find out. But
all these specs, what do they mean?
Like, do they mean that this thing can
beat Terry? My dual 4090 AI server that
cost over $5,000?
Let's find out. And by the way, I think
it needs a name. We're going to name him
Larry. Can Larry beat Terry? Okay, we
got Larry on the left and Terry on the
right. Let's load up our first model.
We'll do a small one, the Quinn 38B.
Load it up. Prompts ready, set, go.
Huh?
Uh, Terry is awesome. And Larry, you
good, dude? Terry won. So, he had 132
tokens per second and Larry's, the DGX
Spark, had 36. I kind of expected it to
be faster. Let's try a bigger model.
Maybe that's where it shines. Let's try
Llama 3.3 70 billion parameters. We'll
load it up and let's try something more
technical. Ready, set, go.
Whoa, this is kind of embarrassing.
>> Hey,
>> um, Alex interrupted me during the
recording with some urgent news, so I'm
going to stop here for now. Okay, so we
just had a meeting with Nvidia because
this confused me. Terry beat Larry by a
long shot, which is frustrating because
I had this whole script written about
this AI supercomputer that defeats
Terry, but that's not the case. So, me
and Alex, my producer, we sat down with
Nvidia and said, "What the heck, guys?
We're running these AI models on Larry,
and Terry's kicking his butt." And they
asked us what models. We told them, and
they were like, "Yeah, no duh. Of
course, your dual 49ers are going to
defeat Larry." And I'm like, "What do
you mean? This is supposed to be an AI
supercomput. It's supposed to be the
best." And then they told me this three
things that actually make this thing
kind of awesome. And it's not what I
expected, especially the third one. I
never even heard of that. Now, before we
get started, just know like this thing
performs well with LM. I just cannot
beat Terry, which Terry is insane. I'm
learning that now. I have new respect
for Terry. But one thing Larry is going
to be better at every single time is
running more stuff. Let's talk about
Terry and Larry real quick. And I'm
drawing on the very example we're about
to talk about. Terry has two Nvidia
4090s that each have 24 GB of VRAM. So
Terry's got 48 gigs of VRAM. But then we
look at Larry. Larry has 128 GB of
unified memory. What does that mean? It
means that memory, that RAM is shared
between the entire system, between the
CPU and the GPU. Meaning the GPU can use
128 GB of RAM. Right now I have a multi
LLM system running. There's all the
containers running right now. This demo
was running GBT OSS 12B, Deep See Coder
6.7B, and Quinn 3 embedding 4B. Yeah, a
multi- aent system, three models. Right
now, it's using 89 gigs. They said it
would use 120 gigs, the entire almost
the entire system. But I point that out
because that's just something Terry
can't do. Terry can run fast. He's a
sprinter, but he can't do a lot. Can't
do long distance. When you're wanting to
do multi- aent frameworks locally, Larry
shines. You might be thinking, well,
hold on, Terry. He's got system memory,
right? Like, yeah, we do. Terry's got
128 gigs of RAM of system RAM. But these
GPUs can't really use that. The bus they
have to take too slow. When you're
talking about AI, it's all about the RAM
that's immediately available to the GPU
natively. Okay, Larry has more GPU
memory. He can do more things, run
bigger models. Let's test image
generation. They said it's actually
really good at image generation, and it
might be Terry. Let's see. We've seen
Terry's pretty strong. I have my doubts.
Now, keep in mind this is probably going
to be rigged. I'm using an example that
Nvidia gave me to show off the power of
this device. Let's make it happen. All
right, Terry on the left, Larry on the
right. We got Comfy UI spun up. We'll do
a basic image generation pipeline. I'm
going to change the image size to the
recommended size they well recommended.
And by they, I mean Nvidia. Make the
image box bigger down here or up here.
And we'll run that basic pipeline. So,
we'll go on uh Larry first. Actually,
we'll do um let's give it a lot of
images like 20. Uh no, not 32. We'll
give it 20 images to make. And if you've
never done local AI image generation,
it's wicked fast, maybe. Ready? Let's
go. Larry first. Go. Go. Okay, things
are happening. Loading the model,
creating stuff. So, Terry's already gone
crazy. Larry's starting now. I can hear
him spinning. He's getting hot, dude. A
bit slower, not faster. Now, you can see
on the right here on our other screen,
it looks like Terry is only configured
to use one GPU right now. Terry's done.
Larry's like, "I'll get back to you."
So, it looks like we have 11 iterations
per second for Terry and roughly one
iteration per second on Larry. Now, I
think we're all realizing now at this
point that comparing Larry to Terry is
not apples to apples. It's apples to
insanely powerful gaming machine I built
specifically for AI. So, let's get real.
This thing can fit in my backpack. It's
ridiculously small and for its size,
it's very powerful. The fact that you
can even do what I'm doing right now,
generating images is crazy. And it has
okay inference. And by inference, I mean
when you're chatting with it, that's
what inference means in AI. When you're
actually getting results after you've
trained it. Now, let's do a fun image.
This is kind of boring.
I said a pug sipping coffee. Oh, it's
cute. Oh, that's cursed.
These are just fun to look at. But
what's cool is like this is still all
yours. Like it's put it in your pocket
if you're wearing Jeno jeans. And you
can generate stuff like this nano banana
wherever you go. Like that's that's
awesome. And the images are I mean if
you if you really dialed this in like if
you drained it yourself it would look
cool. This thing is actually getting I
can hear the fans. Um it's uh it's
getting toasty. I could overe an egg on
this right now. I'm exaggerating but
it's it's good. It's getting there. And
the steel wool they have on the sides
that's actually still cool. I don't know
what that is exactly but I know it's
probably just to look cool but also help
keep it cool. Either way I think the
design is actually pretty neat.
Oh, that actually hurt when I put my You
know what? Need I say anymore? Coffee
cup warmer.
But seriously, I'm running AI and
keeping my coffee hot. Nvidia, you did
it. I'm not kidding. My coffee is
getting kind of cold. I'm going to keep
it there for a second and generate 40
more images. So, image generation was
two. 2.5 is training. Training an LLM to
think the way you want it to think.
giving it your own data and tailoring it
to your very specific use case. This is
where it would actually be better than
Terry because training actually takes
more VRAM. Let me walk you through an
example where they actually gave me a
training data set I could play with. I
want to stop this image generation. My
coffee is getting a little too hot. I'm
just kidding. Terry on the left, Larry
on the right. Let's run some training.
Now, we're training on a smaller model.
It looks like Terry has already loaded
all it needs to do and it started
training and it's doing roughly one
iteration per second. As soon as Larry
loads his shards, we'll be able to see
what he does. But dude, Terry's firing
on all cylinders here. Okay, I can hear
Larry starting to spin up and get crazy.
Okay, and he's training. There's our
metrics right there. Now, here a higher
number is not a good thing. It's taking
him 3 seconds per iteration, whereas
Terry only takes 1 second for an
iteration. So, Terry is roughly three
times faster for training, which Nvidia
is like, he might be faster. He could be
faster than Terry. Doesn't seem like
it's the case. Again, let's keep in mind
Terry's only three times faster than
this little bitty guy. So, grain of salt
there. And then there's another thing we
have to consider here. This is why this
device will probably be the best thing
for AI developers. This is really the
target audience. High inference. It can
happen. But this is where it shines.
Remember, training takes more memory,
more VRAM on that small model. I think
it was an 8B. They could both do it. But
if I wanted to train a 7dB model like a
Llama 3, Terry just wouldn't be able to
load the memory. Like look at this. As
Larry is loading this model into his
memory, look how much is being used.
It's going to keep going up. Okay, so I
have no idea how long this is going to
take. So while this is loading, let me
show you something I love about the
Spark. And why I think this this might
be a killer option for a lot of people,
they make it easy to use. Now, there are
two ways Nvidia gives you to access your
Spark easily. First, you can just plug a
keyboard and mouse in a monitor and use
this like a stinking computer. When
you're using the desktop setup, it's
running Ubuntu where they call it DGXO
OS. Just their version of Ubuntu with
all the drivers and stuff you need. I'm
not going to try and run a different OS
on this thing. That's terrifying. I've
spent all day troubleshooting. The
second way, they have an application
called Nvidia Sync. I want to download
that right now. And what this does is
simplify getting access to this and
using tools for everyone. You can see
down here I have the option to add a
device. Once I launch it, it will detect
the apps I have. It can integrate with a
cursor or VS Code. I have both. Then I
connect to it. And what this is doing in
the background is making SSH access
super simple. Copying over your SSH key
to the Spark and it just connects for
you.
It'll add my device. Get started. Nice
little graphic they have there. I get a
nice dashboard I can log into.
I can jump right into Cursor or VS Code,
launch a terminal from here. So that
make it really easy for someone just to
come in with their laptop and go, I want
to access this thing and do stuff. And
it connects you. Now, speaking of
connecting you, if you're going to have
your AI local, like here on my desk, you
want to be able to access it everywhere,
everywhere you go, but you don't want to
necessarily have to take it with you.
Leave this on your desk at home when
you're at Starbucks or whatever. You
want to be able to access this and use
it, run your AI workloads all the time,
just like you would in the cloud. In my
opinion, the best way to do that is with
Twing. Twate is a sponsor of this video
and an amazing partner with my channel.
Oh, look at that go. And this system
memory is not accurate. That's not how
much we have available right now. Oh,
time to heat up my coffee. Twinate is a
zero trust remote access solution. It's
my favorite because it's free for up to
five users. So, unless you're running a
company or have a super large family,
you should be fine. And it's insanely
easy to set up. Like seriously, all you
got to do is go to
twing.com/networkshuck.
Check the link in the description.
Create your first network in the cloud
and then deploy your first connector.
And when we're talking about the Spark,
we're just going to log in and paste in
one line of config. Like seriously,
watch this. I'll launch my terminal with
the sync app. Paste this command in.
Twin Gigate gave me this and I'm
connected. So now, no matter where I go,
I can access securely. It's like VPN,
but way better, more secure. You're not
opening up any ports in your network.
You don't have to be a network wizard to
make Twin Gate work. Dude, this thing's
cooking out. They have an app for pretty
much every device, iPhone, Android, your
Mac, Windows machine, whatever it is.
And you're getting enterprisegrade
security cuz companies pay and use for
this, but you're getting it for free.
Try it out right now because they are
awesome. I legit use them personally and
for my business. and they help make
videos like this possible. They're one
of my main sponsors. They're awesome.
Anyways, training's started and this
thing's cooking. So stinking hot. Let's
stop that now because I think my coffee
is about to boil.
Take a break, bud. You've been doing
good. So again, this right here,
training, fine-tuning, it shines there
mainly because it has more VRAM and it's
a great option for developers who don't
want to have to rent a cloud GPU to
train their stuff, which I had to do
that when I was training my voice for
Terry. I rented some cloud GPUs. They're
like 30 bucks an hour. Gosh, don't
forget to turn that off. If you have
this sitting on your desk, it might take
a bit longer than a cloud GPU. Yeah, but
it can do it. And that's the key thing.
It can actually load and train the
models. It's hardware is built to train
AI models. That's awesome. And it's so
tiny. And number three, FP4. With AI
models, you can quantize them and make
them smaller so they're easier to run on
smaller devices like this guy here. If
you're running a model at FP16, you need
a lot of VRAM, but you're getting some
of the best quality possible. But we can
quantize the model down to FP8 or FP4.
The quality does degrade as we quantize
it, but it makes it possible to run on
smaller devices. Now, why am I pointing
this out? It's because this guy is built
to run FP4 like a champ. In fact, they
say that it can run FP4 at pretty dang
close to FP8 quality with models that
have been specially made for it. And
they actually provide an entire tutorial
on how to do NVMP4 quantization. So for
example, this one takes the Deepseek R1
distill llama 8B and uses the model
optimizer using two levels of scaling to
keep accuracy while using fewer bits. So
it keeps accuracy close to FP8, usually
less than 1% loss, which that's pretty
cool. But the biggest thing is they have
hardware specifically built to run FP4.
Now what does that mean? Well, think
about a consumer GPU like Terry. Terry,
he can run FP4, but not necessarily in
hardware. You see, Terry has to convert
FP4 in software. He has to think about
it before he can actually run it. Larry,
on the other hand, has special hardware
programmed to run FP4. It's all
happening in hardware super fast. And
this makes Larry great for things like
speculative decoding, which is a new
term I got to learn during this video.
It's actually kind of a cool concept.
And here's what it does. So, while
Larry, he's not necessarily great for
fast inference, speculative decoding
makes it to where he can be super fast.
And this is also what makes him unique
compared to other local AI hosting
options. And here's how it does that.
Speculative decoding speeds up text
generation by using a small fast model
to draft several tokens ahead, then
having a larger model quickly verify or
adjust them. So, the big model doesn't
have to do all the work. The smaller
model is doing that, but he makes sure
the output quality is good. reducing
latency. Now, to do that, we're
essentially running two models at the
same time, requiring more VRAM, which
consumer GPUs just couldn't do. Let's
test this out. Okay, I've got the models
loaded up. I mean, look at this. It's
using 77 gigs of VRAM. Let's uh test it
out with a query. Explain the benefits
of specul
that word scaling me. Speculative
decoding. Let's watch it have a fit
heating up. It's being used processing.
So, what's happening here again? Smaller
model is doing the stuff. Bigger model
checks it. That was actually pretty
stinking fast using 70B. Does it give me
any token statistics? No. It fell fast
though. Okay, Spark has its advantages.
Okay, Larry, he's got some things going
for him. He's not the fastest guy on the
team, but you can put him in any
position. Shoot, he can play four
positions at one time. The analogy is
going off the rails. He can do a lot for
how small of a guy he is. But the big
question is, should you buy him? Now,
the model I have here, it's got 4 TB of
storage. It's a Founders Edition. It
cost 4K or $3,999
because marketing. They will have
cheaper variants from OEM partners. I
think they'll have a two TBTE model for
like $3,000. They haven't put the
numbers out yet, but that's what I've
heard. So, let's compare him to Terry.
Terry cost over 5K. Terry's massive.
Like, I had to lift him up into the
other room to film some B-roll for him.
I'm like, "Oh my gosh, I think I hurt my
arm. Actually, I'm also getting old, but
my arm hurt this weekend." Terry draws a
lot of power. If you're to run Terry for
a year, it's going to cost you $ 1,400
bucks. That's how much that's costing
me. Terry, the Spark will roughly cost
you $315 in a year to run. And that's
running 24/7. Oh, I forgot to mention
the Spark is 240 watts while Terry is,
what do we say, 1100 watts. So, the
footprint is certainly smaller. We're
not running a data center here. But the
thing is, I don't think Terry's the best
comparison for this. There's some
newcomers in the market that I think are
pretty interesting. I just saw one from
Surf the Home, a YouTube channel I love,
and it's a Beink device that has the new
AMD AI chips in it. This Beink device
also has 128 GB of unified memory. So,
they're neck andneck on those specs, but
they don't have the Nvidia Blackwell
chips that are optimized for FP4.
They've got AMD doing whatever AMD is
doing. Looking at his performance, the
inference is pretty similar to this guy
here. The device itself is like a mini
PC, the same size, but the cost is
around $2,000. Now again, this is not
apples to apples because when you're
comparing Nvidia to AMD, Nvidia is way
ahead of the game on AI. The AMD AI
stuff, it sounds pretty cool, but you
have to have things developed for it.
You have to have a whole ecosystem
around that to use some fun stuff.
Nvidia's already got that. They're way
ahead. Now, disclaimer, I've not played
with any of these new AMD AI things yet,
which is why I'm like not even using
technical terms when I'm describing it.
I just know they exist. And because I
didn't have very much time to make this
video about this device, what I can say
right now is Nvidia is the option you
want if you want things to work and you
don't want to spend so much time getting
things set up and troubleshooting. And
that's from getting this thing set up. I
mean, like literally, I unboxed this and
they have instructions to use your phone
to connect to its Wi-Fi hotspot and get
it connected to your Wi-Fi. Like it has
the ease of use like buying a smart home
device. Like I think I had more trouble
connecting my light bulb to my home
assistant than getting this set up.
That's plus 10 points. 10 points to
Gryffindor. The NVIDIA sync thing is
very cool. It gives developers an easy
way just to boom connect to it. They
don't have to be DevOps people. They
don't have to be nerds like me, although
everyone should be. They don't have to
know how to do a home lab. They don't
have to build Terry. Terry took a lot of
work. So, I can tell NVIDIA put a lot of
work into making this simple. You're
kind of getting that Apple experience.
And there, that's kind of the way I see
it. They're like the Apple of AI right
now where Apple is not the AI of
anything. Although, hold on. There is
one more thing we got to think about.
Actually, we got to bring up Apple here
in a moment because this guy's not the
only one doing unified memory. Now, what
the Spark has going for it is you can
add another Spark to it. It has a QSFP
port on the back which will give you
blazing speeds to another Spark using
NCCLG GPU toGPU communication. They're
saying you get 200 Gbits per second of
bandwidth. And while the inference speed
won't be as fast as on one, you'll be
able to do more with two. So, I say all
that to get to here, should you buy one?
And really, I'm asking the question for
myself like would I buy one? Now, first
they said it was an AI supercomputer.
I don't think this feels like a
supercomput. Maybe a mini supercomput.
Maybe that's a better marketing term. I
get the marketing thing. This doesn't
quite say super to me. It's impressive
what it does, especially for the form
factor. But for me, $4,000 for a device
like this, I would want higher inference
speeds for myself. When I thought about
this device before I saw any of the
specs, I was hoping like, oh, we're
gonna have a device built for us for
high inference. So, Terry in there,
great at high inference, but he's only
got 48 gigs of RAM. I want a GPU with a
ton of VRAM. Forget the gaming. Put the
gaming to the side. I want to do AI.
Design something for a consumer to do
that. This, I don't think, is really
meant for a consumer. At least not for
me wanting high inference. Now, on the
other hand, if you're a developer and
your main job is like developing AI,
you're fine-tuning, you're doing all
that fun data science stuff, which I
don't normally do every day, that's not
my day-to-day, this might be the device
for you because you don't have to rent
something in the cloud. This device can
pay for itself over time. If you're
renting a GPU in the cloud for 30 bucks,
it's not going to give you the same
performance as the cloud, but it can do
the same stuff as what you can in the
cloud, whereas before that really wasn't
possible. So having this and just being
able to connect it to your laptop over
the network and it just is so tiny and
small and just sits there, that's pretty
cool. So if I'm fine-tuning every day, I
would think about getting this. But if
I'm running O Lama open web UI, Comfy
UI, doing some crazy high inference
tasks, I want more speed. Terry still
wins, but I cannot wait for the day
where someone, I don't care who gives it
to us, gives us a device like this that
can run the biggest and baddest models
at cloud speeds. Or at least just give
me half that speed. Just give me
something. Now, I'm curious though, and
I have not tried this yet. I've not
attempted this yet. I wonder how this
will do against a Mac. Speaking of
apples to apples, Macs have unified
memory. Now, you saw me cluster five
Macs together. I right now actually have
it's attached to stuff. This is a Mac
that Apple sent me. It's a Mac Studio M3
fully maxed out. It's got 512 GB of
unified memory. I wonder how this would
do against this guy. I think I'll do
another video on this. Anyways, that's
all I got. I don't normally do reviews,
but this is kind of like something I've
been waiting for. It's been on my wish
list, and I'm so glad Nvidia sent this
to me. They had no control over this
video. They did not see this video
before. They just sent this to me and
said, "Hey, please look at it." They
were very gracious in giving us their
time to help us learn this device and
what it can do. But they had no input
into this. So, what do you think? Is
this the groundbreaking device we've
been waiting for, or is it like meh? And
I mean, if you're a developer, like,
does this get you excited? Like, oh my
gosh, finally I can finetune on my desk.
That's That's a cool idea. Let me know
below. I want to know your thoughts.
Anyways, that's all I got. I will catch
you guys next time. Hey, I was just
watching the review of this video and I
realized, oh my gosh, I forgot to pray
at the end. I'm trying to start doing
that now. And if you're like, what are
you talking about, Chuck? Um, I'm
starting a new thing at the end of my
videos where I just want to pray for you
guys, my audience. Um, you're the reason
I'm here and I want to see you succeed.
I want you to have an amazing career.
here. I want you to have an amazing
life. I want to pray for your families.
Uh now, why am I doing that? I'm a
believer. I believe in Jesus Christ. And
um he's the reason I'm here doing what I
do. So, I'm not sure where you're at and
your belief. I'm sure I I know my
audience has a wide breath of beliefs.
Uh but I would love just to pray for
you. Um no pressure. If you want to end
the video now, that's totally cool. If
you want to hang out and just hear a
prayer, hey, I would love that. So, I'm
going to pray for you right now. It is
weird, I know, but I'm going to do it
anyway cuz
let's go. God, I uh thank you for the
person watching this video. Um, I pray
right now that through this computer
screen, over the internet, through the
bits and bites that I believe you
control and you have power over, I pray
over this person that they would be full
of energy and excitement for technology
and that um, first give them wisdom on
whether or not they should buy this
device, but also
bless them in their career. Uh, they are
learning these things because they're
excited about tech. And I pray that you
would take these skills and this
interest and this curiosity and turn it
into
um
positions and and uh influence and
blessing for the people in their lives.
Lord, uh bless their families and be
with their their friends and their
co-workers. Allow them to be a light in
their life. And uh I ask that just this
video they're watching now would
encourage them to
do some amazing things in their life and
their career. And ultimately I pray that
they would find
their meaning, their their identity,
their
the reason for being in you, Lord.
Because at the end of the day, this
stuff is super fun, of course, and we
can obsess over it, but there's more to
life than this. So, I pray they find
that.
It's in Jesus name I pray. Amen. All
right. Thanks, guys. I'll catch y'all
later.