DeepSeek launched a free, open-source large language model in late December, claiming it was developed in just two months at a cost of under $6 million.
Thank the fucking sky fairies actually, because even if AI continues to mostly suck it’d be nice if it didn’t swallow up every potable lake in the process. When this shit is efficient that makes it only mildly annoying instead of a complete shitstorm of failure.
While this is great, the training is where the compute is spent. The news is also about R1 being able to be trained, still on an Nvidia cluster but for 6M USD instead of 500
True, but training is one-off. And as you say, a factor 100x less costs with this new model. Therefore NVidia just saw 99% of their expected future demand for AI chips evaporate
Even if they are lying and used more compute, it’s obvious they managed to train it without access to the large amounts of the highest end chips due to export controls.
Conservatively, I think NVidia is definitely going to have to scale down by 50% and they will have to reduce prices by a lot, too, since VC and government billions will no longer be available to their customers.
True, but training is one-off. And as you say, a factor 100x less costs with this new model. Therefore NVidia just saw 99% of their expected future demand for AI chips evaporate
It might also lead to 100x more power to train new models.
SFT (supervised fine-tuning), a standard step in AI development, involves training models on curated datasets to teach step-by-step reasoning, often referred to as chain-of-thought (CoT). It is considered essential for improving reasoning capabilities. DeepSeek challenged this assumption by skipping SFT entirely, opting instead to rely on reinforcement learning (RL) to train the model.
This bold move forced DeepSeek-R1 to develop independent reasoning abilities, avoiding the brittleness often introduced by prescriptive datasets.
This totally changes the way we think about AI training, which is why while OpenAI spent $100m on training GPT-4, running an expected 500,000 GPUs, DeepSeek used about 50,000, and likely spent that same roughly 10% of the cost.
So while operation, and even training, is now cheaper, it’s also substantially less compute intensive to train models.
And not only is there less data than ever to train models on that won’t cause them to get worse by regurgitating other worse quality AI-generated content, but even if additional datasets were scrapped entirely in favor of this new RL method, there’s a point at which an LLM is simply good enough.
If you need to auto generate a corpo-speak email, you can already do that without many issues. Reformat notes or user input? Already possible. Classify tickets by type? Done. Write a silly poem? That’s been possible since pre-ChatGPT. Summarize a webpage? The newest version of ChatGPT will probably do just as well as the last at that.
At a certain point, spending millions of dollars for a 1% performance improvement doesn’t make sense when the existing model just already does what you need it to do.
I’m sure we’ll see development, but I doubt we’ll see a massive increase in training just because the cost to run and train the model has gone down.
I’m not sure. That’s a very static view of the context.
While china has an AI advantage due to wider adoption, less constraints and overall bigger market, the US has higher tech, and more funds.
OpenAI, Anthropic, MS and especially X will all be getting massive amounts of backing and will reverse engineer and adopt whatever advantages R1 had. Which while there are some it’s still not a full spectrum competitor.
I see the is as a small correction that the big players will take advantage of to buy stock, and then pump it with state funds, furthering the gap and ignoring the Chinese advances.
Regardless, Nvidia always wins. They sell the best shovels. In any scenario the world at large still doesn’t have their Nvidia cluster, think Africa, Oceania, South America, Europe, SEA who doesn’t necessarily align with Chinese interests, India. Plenty to go around.
Extra funds are only useful if they can provide a competitive advantage.
Otherwise those investments will not have a positive ROI.
The case until now was built on the premise that US tech was years ahead and that AI had a strong moat due to high computer requirements for AI.
We now know that that isn’t true.
If high compute enables a significant improvement in AI, then that old case could become true again. But the prospects of such a reality happening and staying just got a big hit.
I think we are in for a dot-com type bubble burst, but it will take a few weeks to see if that’s gonna happen or not.
Maybe, but there is incentive to not let that happen, and I wouldn’t be surprised if “they” have unpublished tech that will be rushed out.
The ROI doesn’t matter, it wasn’t there yet it’s the potential for it. The Chinese AIs are also not there yet. The proposition is to reduce FTEs, regardless of cost, as long as cost is less.
While I see OpenAi and mostly startups and VC reliant companies taking a hit, Nvidia itself as the shovel maker will remain strong.
if, on a modern gaming pc, you can get breakneck speeds of 5 tokens per second, then actually inference is quite energy intensive too. 5 per second of anything is very slow
Maybe? Depends on what costs dominate operations. I imagine Chinese electricity is cheap but building new data centres is likely much cheaper % wise than countries like the US.
i can also run it on my old pentium from 3 decades ago. I’d have to swap 4MiB of weights in and out constantly, it will be very very slow, but it will work.
Sure you can run it on low end hardware, but how does the performance (response time for a given prompt) compare to the other models, either local or as a service?
That set of tokens/s is the performance, or response time if you’d like to call it that. GPT-o1 tends to get anywhere from 33-60, whereas in the example I showed previously, a Raspberry Pi can do 200 on a distilled model.
Now, granted, a distilled model will produce worse performance than the full one, as seen in a benchmark comparison done by DeepSeek here (I’ve outlined the most distilled version of the newest DeepSeek model, which is likely the kind that is being run on the Raspberry Pi, albeit likely with some changes made by the author of that post, as well as OpenAI’s two most high-end models of a comparable distillation)
The gap in quality is relatively small for a model that is likely distilled far past what OpenAI’s “mini” model is, when you consider that even regular laptop/PC hardware is orders of magnitudes more powerful than a Raspberry Pi, or that an external AI accelerator can be bought for as little as $60, the quality in practice could be very comparable with even slightly less distillation, especially with fine-tuning for a given use case (e.g. a local version of DeepSeek in a code development platform would be fine-tuned specifically just to produce code-related results)
If you get into the region of only cloud-hosted instances of DeepSeek that are running at-scale on GPUs like OpenAI’s models are, the performance is only 1-2 percentage points off from OpenAI’s model, at about 3-6% of the cost, which effectively means 3-6% of the total amount of GPU power being paid for compared to the amount of GPU power OpenAI is paying for.
It requires only 5% of the same hardware that OpenAI needs to do the same thing. So that can mean less quantity of top end cards and it can also run on less powerful cards (not top of the line).
Should their models become standard or used more commonly, then nvidis sales will drop.
Doesn’t this just mean that now we can make models 20x more complex using the same hardware? There’s many more problems that advanced Deep Learning models could potentially solve that are far more interesting and useful than a chat bot.
I don’t see how this ends up bad for Nvidia in the long run.
Honestly none of this means anything at the moment. This might be some sort of calculated trickery from China to give Nvidia the finger, or Biden the finger, or a finger to Trump’s AI infrastructure announcement a few days ago, or some other motive.
Maybe this “selloff” is masterminded by the big wall street players (who work hand-in-hand with investor friendly media) to panic retail investors so they can snatch up shares at a discount.
What I do know is that “AI” is a very fast moving tech and shit that was true a few months ago might not be true tomorrow - no one has a crystal ball so we all just gotta wait and see.
There could be some trickery on the training side, i.e. maybe they spent way more than $6M to train it.
But it is clear that they did it without access to the infra that big tech has.
And on the run side, we can all verify how well it runs and people are also running it locally without internet access. There is no trickery there.
They are 20x cheaper than OpenAI if you run it on their servers and if you run it yourself, you only need a small investment in relatively affordable servers.
The way I understood it, it’s much more efficient so it should require less hardware.
Nvidia will sell that hardware, an obscene amount of it, and line will go up. But it will go up slower than nvidia expected because anything other than infinite and always accelerating growth means you’re not good at business.
And you should, generally we are amidst the internet world war. It’s not something fishy but digital rotten eggs thrown around by the hundreds.
The only way to remain sane is to ignore it and scroll on. There is no winning versus geopolitical behemoths as a lone internet adventurer. It’s impossible to tell what’s real and what isn’t the first casualty of war is truth
It still rely on nvidia hardware why would it trigger a sell-off? Also why all media are picking up this news? I smell something fishy here…
Here’s someone doing 200 tokens/s (for context, OpenAI doesn’t usually get above 100) on… A Raspberry Pi.
Yes, the “$75-$120 micro computer the size of a credit card” Raspberry Pi.
If all these AI models can be run directly on users devices, or on extremely low end hardware, who needs large quantities of top of the line GPUs?
Thank the fucking sky fairies actually, because even if AI continues to mostly suck it’d be nice if it didn’t swallow up every potable lake in the process. When this shit is efficient that makes it only mildly annoying instead of a complete shitstorm of failure.
While this is great, the training is where the compute is spent. The news is also about R1 being able to be trained, still on an Nvidia cluster but for 6M USD instead of 500
True, but training is one-off. And as you say, a factor 100x less costs with this new model. Therefore NVidia just saw 99% of their expected future demand for AI chips evaporate
Even if they are lying and used more compute, it’s obvious they managed to train it without access to the large amounts of the highest end chips due to export controls.
Conservatively, I think NVidia is definitely going to have to scale down by 50% and they will have to reduce prices by a lot, too, since VC and government billions will no longer be available to their customers.
It might also lead to 100x more power to train new models.
I doubt that will be the case, and I’ll explain why.
As mentioned in this article,
This totally changes the way we think about AI training, which is why while OpenAI spent $100m on training GPT-4, running an expected 500,000 GPUs, DeepSeek used about 50,000, and likely spent that same roughly 10% of the cost.
So while operation, and even training, is now cheaper, it’s also substantially less compute intensive to train models.
And not only is there less data than ever to train models on that won’t cause them to get worse by regurgitating other worse quality AI-generated content, but even if additional datasets were scrapped entirely in favor of this new RL method, there’s a point at which an LLM is simply good enough.
If you need to auto generate a corpo-speak email, you can already do that without many issues. Reformat notes or user input? Already possible. Classify tickets by type? Done. Write a silly poem? That’s been possible since pre-ChatGPT. Summarize a webpage? The newest version of ChatGPT will probably do just as well as the last at that.
At a certain point, spending millions of dollars for a 1% performance improvement doesn’t make sense when the existing model just already does what you need it to do.
I’m sure we’ll see development, but I doubt we’ll see a massive increase in training just because the cost to run and train the model has gone down.
Thank you. Sounds like good news.
I’m not sure. That’s a very static view of the context.
While china has an AI advantage due to wider adoption, less constraints and overall bigger market, the US has higher tech, and more funds.
OpenAI, Anthropic, MS and especially X will all be getting massive amounts of backing and will reverse engineer and adopt whatever advantages R1 had. Which while there are some it’s still not a full spectrum competitor.
I see the is as a small correction that the big players will take advantage of to buy stock, and then pump it with state funds, furthering the gap and ignoring the Chinese advances.
Regardless, Nvidia always wins. They sell the best shovels. In any scenario the world at large still doesn’t have their Nvidia cluster, think Africa, Oceania, South America, Europe, SEA who doesn’t necessarily align with Chinese interests, India. Plenty to go around.
Extra funds are only useful if they can provide a competitive advantage.
Otherwise those investments will not have a positive ROI.
The case until now was built on the premise that US tech was years ahead and that AI had a strong moat due to high computer requirements for AI.
We now know that that isn’t true.
If high compute enables a significant improvement in AI, then that old case could become true again. But the prospects of such a reality happening and staying just got a big hit.
I think we are in for a dot-com type bubble burst, but it will take a few weeks to see if that’s gonna happen or not.
Maybe, but there is incentive to not let that happen, and I wouldn’t be surprised if “they” have unpublished tech that will be rushed out.
The ROI doesn’t matter, it wasn’t there yet it’s the potential for it. The Chinese AIs are also not there yet. The proposition is to reduce FTEs, regardless of cost, as long as cost is less.
While I see OpenAi and mostly startups and VC reliant companies taking a hit, Nvidia itself as the shovel maker will remain strong.
if, on a modern gaming pc, you can get breakneck speeds of 5 tokens per second, then actually inference is quite energy intensive too. 5 per second of anything is very slow
That’s becoming less true. The cost of inference has been rising with bigger models, and even more so with “reasoning models”.
Regardless, at the scale of 100M users, big one-off costs start looking small.
But I’d imagine any Chinese operator will handle scale much better? Or?
Maybe? Depends on what costs dominate operations. I imagine Chinese electricity is cheap but building new data centres is likely much cheaper % wise than countries like the US.
Wth?! Like seriously.
I assume they are running the smallest version of the model?
Still, very impressive.
i can also run it on my old pentium from 3 decades ago. I’d have to swap 4MiB of weights in and out constantly, it will be very very slow, but it will work.
Sure you can run it on low end hardware, but how does the performance (response time for a given prompt) compare to the other models, either local or as a service?
That set of tokens/s is the performance, or response time if you’d like to call it that. GPT-o1 tends to get anywhere from 33-60, whereas in the example I showed previously, a Raspberry Pi can do 200 on a distilled model.
Now, granted, a distilled model will produce worse performance than the full one, as seen in a benchmark comparison done by DeepSeek here (I’ve outlined the most distilled version of the newest DeepSeek model, which is likely the kind that is being run on the Raspberry Pi, albeit likely with some changes made by the author of that post, as well as OpenAI’s two most high-end models of a comparable distillation)
The gap in quality is relatively small for a model that is likely distilled far past what OpenAI’s “mini” model is, when you consider that even regular laptop/PC hardware is orders of magnitudes more powerful than a Raspberry Pi, or that an external AI accelerator can be bought for as little as $60, the quality in practice could be very comparable with even slightly less distillation, especially with fine-tuning for a given use case (e.g. a local version of DeepSeek in a code development platform would be fine-tuned specifically just to produce code-related results)
If you get into the region of only cloud-hosted instances of DeepSeek that are running at-scale on GPUs like OpenAI’s models are, the performance is only 1-2 percentage points off from OpenAI’s model, at about 3-6% of the cost, which effectively means 3-6% of the total amount of GPU power being paid for compared to the amount of GPU power OpenAI is paying for.
It requires only 5% of the same hardware that OpenAI needs to do the same thing. So that can mean less quantity of top end cards and it can also run on less powerful cards (not top of the line).
Should their models become standard or used more commonly, then nvidis sales will drop.
Doesn’t this just mean that now we can make models 20x more complex using the same hardware? There’s many more problems that advanced Deep Learning models could potentially solve that are far more interesting and useful than a chat bot.
I don’t see how this ends up bad for Nvidia in the long run.
Honestly none of this means anything at the moment. This might be some sort of calculated trickery from China to give Nvidia the finger, or Biden the finger, or a finger to Trump’s AI infrastructure announcement a few days ago, or some other motive.
Maybe this “selloff” is masterminded by the big wall street players (who work hand-in-hand with investor friendly media) to panic retail investors so they can snatch up shares at a discount.
What I do know is that “AI” is a very fast moving tech and shit that was true a few months ago might not be true tomorrow - no one has a crystal ball so we all just gotta wait and see.
There could be some trickery on the training side, i.e. maybe they spent way more than $6M to train it.
But it is clear that they did it without access to the infra that big tech has.
And on the run side, we can all verify how well it runs and people are also running it locally without internet access. There is no trickery there.
They are 20x cheaper than OpenAI if you run it on their servers and if you run it yourself, you only need a small investment in relatively affordable servers.
Give that statement to maybe not super techy investors, and that could spook them into the sell-off.
The way I understood it, it’s much more efficient so it should require less hardware.
Nvidia will sell that hardware, an obscene amount of it, and line will go up. But it will go up slower than nvidia expected because anything other than infinite and always accelerating growth means you’re not good at business.
Back in the day, that would tell me to buy green.
Of course, that was also long enough ago that you could just swap money from green to red every new staggered product cycle.
A year ago the price was $62, now after the fall it is $118. Stocks are volatile, what else is new? Pretty much non-news if you ask me.
And you should, generally we are amidst the internet world war. It’s not something fishy but digital rotten eggs thrown around by the hundreds.
The only way to remain sane is to ignore it and scroll on. There is no winning versus geopolitical behemoths as a lone internet adventurer. It’s impossible to tell what’s real and what isn’t
the first casualty of war is truth
deleted by creator