Learning Library

← Back to Library

DeepSeek V3: Affordable Open-Source AI

Key Points

  • A new “four‑class” language model called DeepSeek V3 can be built, maintained, and run for roughly $5 million—orders of magnitude cheaper than the $70‑$100 million cost of models like ChatGPT or Claude.
  • The model’s creators open‑sourced the architecture and training pipeline, enabling startups and individual researchers to replicate or improve upon it.
  • Instead of ingesting the entire internet, DeepSeek V3 was trained on a carefully curated, high‑quality corpus covering English, Chinese, math, and code, with extensive human‑in‑the‑loop reinforcement for accuracy.
  • Although the full network contains 617 billion parameters, the system only activates about 37 billion of them per query, picking an efficient “sliver” of the model to generate responses.
  • Leveraging confidence from its curated data, the model predicts multiple tokens ahead (e.g., two tokens at a time), further improving inference speed and computational efficiency.

Full Transcript

# DeepSeek V3: Affordable Open-Source AI **Source:** [https://www.youtube.com/watch?v=QMuwRymNMuw](https://www.youtube.com/watch?v=QMuwRymNMuw) **Duration:** 00:05:26 ## Summary - A new “four‑class” language model called DeepSeek V3 can be built, maintained, and run for roughly $5 million—orders of magnitude cheaper than the $70‑$100 million cost of models like ChatGPT or Claude. - The model’s creators open‑sourced the architecture and training pipeline, enabling startups and individual researchers to replicate or improve upon it. - Instead of ingesting the entire internet, DeepSeek V3 was trained on a carefully curated, high‑quality corpus covering English, Chinese, math, and code, with extensive human‑in‑the‑loop reinforcement for accuracy. - Although the full network contains 617 billion parameters, the system only activates about 37 billion of them per query, picking an efficient “sliver” of the model to generate responses. - Leveraging confidence from its curated data, the model predicts multiple tokens ahead (e.g., two tokens at a time), further improving inference speed and computational efficiency. ## Sections - [00:00:00](https://www.youtube.com/watch?v=QMuwRymNMuw&t=0s) **Low-Cost Open-Source LLM Breakthrough** - The speaker highlights DeepSeek V3, an open‑source language model trained on a curated high‑quality dataset for roughly $5 million—far cheaper than ChatGPT‑scale models—and argues it opens the door for startups to build their own competitive AI systems. ## Full Transcript
0:00what if I told you that there was a four 0:01class model out there that was 10 times 0:03cheaper to build maintain and execute on 0:07so Chad gp4 set the bar for models in 0:102024 it's since been surpassed by 0:13inference time compute models like 01 01 0:16pro3 but it's still really really good 0:19at a lot of different things it's good 0:20at English it's good at coding it's good 0:22at math Etc well there's now a new model 0:25instead of costing the 70 or100 million 0:28that chat GPT cost to train similar cost 0:31for Claude this model only cost $5 0:35million maybe 5 a half that is not that 0:38much A lot of startups have $5 million 0:41it's a lot for an individual but a lot 0:42of startups have $5 0:44million and it's really amazing to see a 0:47world where we could actually Envision 0:49individual startups being able to build 0:52their own models and this is something 0:53that the makers of this model have 0:55chosen to open source so it's something 0:57anybody can look at and say how could I 0:59make it even better or how could I do it 1:01myself and they've done a number of 1:03really interesting things throughout the 1:05model build process that they are 1:06revealing to the world and I just want 1:08to highlight a couple I'll share the 1:09paper in the in the description here 1:11this is deep seek V3 it's very cool so 1:16the training data was something they 1:17took a lot of care with they did not do 1:18sort of the suck up the whole internet 1:20vibe that chat GPT did they actually had 1:22a very specific training Corpus of very 1:24high quality tokens that they trained 1:27against and they really really reviewed 1:29it to make sure it was good at English 1:30good at Chinese good at math and good at 1:32coding and then they reinforced that 1:35carefully with human responses to ensure 1:38it was really really accurate and that 1:39gave them a lot of confidence during 1:43query time so when you type in a query 1:45it gave them confidence to actually 1:47predict more tokens ahead and be more 1:48efficient in their use of space so even 1:51though this is a very large model it's 1:52like 617 billion tokens right it's a 1:55very large model uh large for a four 1:57class model like it's it's it's not 2:00something that you would expect to be 2:03this efficient is the way I'll put it 2:05but they have figured out that you can 2:08use just a sliver of that total model 2:11space in the response and it's about 2:12picking the right sliver and so where 2:14other models like metas Llama Or Claude 2:17or Chad GPT use the whole model space 2:20this model only uses 37 billion uh 2:24parameters out of the 617 billion 2:26parameter model for any given response 2:28and so it's about picking the WR 37 2:30billion which sounds like a lot of 2:32parameters Until you realize it's such a 2:34tiny percentage of the total in the 2:35model and they're actually making very 2:37efficient use of comput they're also 2:40able to predict more than one token 2:42ahead because they're so confident in 2:43their training data and so instead of 2:45predicting only one token head they're 2:46predicting two and that's a really cool 2:49Innovation I expect to see other folks 2:51go after that as well now they did some 2:53other cool things during the training 2:54phase they had something called dual 2:56pipe which I've tried to explain a 2:57couple of times on video It's rather 2:59complicated it basically amounts to 3:02being able to regurgitate and learn at 3:04the same time and they had a special 3:07network setup to do that they outlined 3:08that in the paper I definitely recommend 3:10diving in for the details 3:12there but from a from a strategic 3:15perspective if we step back what this 3:18really means is that models have gone 3:20from being something that are in the 3:21hundred million doll class only a few 3:23startups could ever afford to anybody 3:25can build this if they have startup 3:26level seed uh investment that is a mass 3:29massive massive shift it is going to 3:31make more and more four class models 3:33available and it's going to be yet 3:34another driver in this overall strategic 3:37theme of four class intelligence 3:39becoming essentially free they've open 3:42sourced this model anybody can use it 3:44right now and anybody can replicate it 3:46right now so if you think about it we 3:50now have a world where four class models 3:51are becoming free and The Cutting Edge 3:54is an inference time compute and these 3:55models don't really use the 3:58multi-threaded uh multi token prediction 4:00that inference time compute has where 4:02you type in a query and it runs but lots 4:04and lots of different next token 4:06prediction threads and finds the best 4:08one now that may get open source next 4:11year right like the pace we're going at 4:13we may well see a model like that get 4:14open source next year but for now that 4:17is the model and that is The Edge that 4:19chat GPT has in the space nobody else 4:21really has that kind of inference time 4:23compute yet lots of people are working 4:25on it and the four class models like 4:28Claude Sonet 3.5 5 or 3.6 like chat GPT 4:3140 those are rapidly getting replicated 4:34cost is driving to 4:36zero it's a it's a massive achievement 4:39like and I will grant you it is easier 4:41to innovate than it is to replicate so 4:44getting to the first chat gpg 4 May well 4:47have cost $100 million no matter how you 4:49did it because it was the first time but 4:51replicating it turns out to be very very 4:53efficient and very very affordable 4:55relatively speaking and that has huge 4:57implications because it means that 4:58intelligence is going to be more and 5:01more and more free for a lot of 5:02different applications that matter in 5:05business so we will see but right now a 5:08$5 million model is beating 40 and Son 5:103.5 at a lot of the things that people 5:12really use these models for like English 5:14like math like coding Etc so there you 5:18have it deep seek 5:20V3 new four class model Champion cheers