Skip to content

INSIGHTS

Could DeepSeek's R1 Release be a Net-Positive for the AI Industry?

Altman Solon is the largest global TMT consulting firm with expertise in telecommunications, media, and technology consulting. In the following insight, we share five predictions on how DeepSeek R1 will impact the international AI landscape, focusing on cost, efficiency, access, and innovation.

The surprise release of DeepSeek's R1 large reasoning model (LRM) shook the tech world and challenged many beliefs about generative AI development costs, efficiency, and adoption. While DeepSeek’s reported pre-training cost of $6 million is not the whole story, it does appear to have achieved significant efficiencies both in training costs and inference.

AI experts at Altman Solon have discussed DeepSeek R1's impact on the broader AI landscape with colleagues, clients, and model builders, paying particular attention to the future of AI training and inference. We agreed upon three key takeaways on how the broader AI ecosystem will be affected post-DeepSeek R1. These are subject to change, given the rapid developments in the sector.

I. Frontier AI model training efficiency and compute spend in sharper focus

Before DeepSeek R1's release, much of the discourse surrounding frontier models was concerned with their growing size and rising training costs. While improving the efficiency of model training was a priority, pre-DeepSeek, there were no well-publicized examples of frontier-model-like performance at meaningfully lower training costs.  

Again, it is important to note that the claims of DeepSeek's cost-to-train are likely understated and do not account for the company's substantial GPU purchases over the past few years (pre- and post-U.S. GPU sanctions) nor the human capital expended in developing the LRM. That aside, DeepSeek R1's training efficiencies were made possible, in part, by distillation, which extracted knowledge from larger models released over six months ago. It is not known to what extent such strong performance and low costs were enabled by others’ prior massive investments. The distillation process makes it hard to compare the training demand for large frontier models directly. However, the fact that DeepSeek R1 matched and slightly exceeded frontier model performance six to nine months later at a lower cost is impressive, especially in a context of hardware restrictions.

Overall, we believe that in the wake of DeepSeek R1, leading model builders — and their financial backers — will focus more on the efficiency of training frontier models, resulting in some downward pressure on the associated compute spend.

II. A more cautious approach to "blank check" AI investments going forward

DeepSeek R1's technical achievements have caused many tech actors — including capital allocators, hyperscalers, and more cautious segments of the financial community — to reevaluate the return on their AI investments. 

Despite their exorbitant costs, it appears that frontier models remain key to delivering leading-edge performance and enabling lower-cost inference. It is likely that DeepSeek R1 could not have attained its impressive performance without using frontier models during training. If compute-intensive frontier models are key enablers to creating “fast-follower” models as performative and efficient as DeepSeek R1, then there will continue to be strong incentives for OpenAI, Anthropic, and others to make massive investments in next-generation frontier models. This is particularly relevant if one believes that Artificial General Intelligence (AGI) is within reach, of which Sam Altman, Mark Zuckerberg, Elon Musk, Sergey Brin, and Dario Armodei seem convinced.

While we do not believe that DeepSeek fundamentally changes these leaders' determination to make massive investments in developing frontier models, we do see it sowing doubt on "blank check" investments as shareholders (and their lenders) more deeply scrutinize assumptions about monetization, profitability, and returns in the face of potential fast followers with leaner cost structures.

III. Key players in the global AI value chain face a new reality

An AI breakthrough from China challenged the U.S.'s heretofore uncontested leadership in the sector, with the effects rippling out across the AI value chain.

Hyperscalers and GPU-as-a-service (GPUaaS) providers more exposed to training may see a more diverse mix of open-source/open-weight models being trained or fine-tuned by AI labs, given a wider range of foundational models available at different price points. They may need to rebalance their infrastructure strategies accordingly.

Faced with claims from DeepSeek that its servers used less powerful H800 chips than Nvidia's newest GPUs and consumed 50-75% less energy, we believe semiconductor companies and chip manufacturers — like Nvidia — will proactively market efficiency improvements that enable both cutting-edge frontier models and more efficient fast-follower models like DeepSeek. Furthermore, it seems likely that the U.S. will rethink its global AI policies in response to these shifting power dynamics, perhaps in the form of new hardware sanctions.

We do not believe there will be a meaningfully negative impact on data centers and utilities, as training efficiencies are paired with more power-intensive inference. On the one hand, efficiency improvements in model training are an ongoing part of the AI development process. On the other, DeepSeek will put pressure on hundreds of billions of dollars of business cases with multi-year paybacks.

Overall, we expect ultra-large-scale training demand to see some headwinds, but the likely proliferation of model builders, given the lower barriers and the more energy-intensive nature of reasoning inference, should offset this impact. Of course, any incremental take-up of AI as a result of lower costs would further boost demand. This phenomenon, often referred to as Jevon's Paradox, typically occurs in technology when greater efficiencies actually lead to higher consumption. However, the nascent, land-grab state of this industry — and the heavy subsidization from hyperscalers and investors alike — make it challenging to accurately assess the potential impact.

Of course, many other developments are playing out in real-time that could materially change the trajectory of AI and data center demand. For example, DeepSeek’s use of only 37 billion active parameters — or approximately 5% of the 671 billion used to train the model — could combine with new hardware and software innovations such as NVIDIA’s DIGITS crimp inferencing demand, at least in the context of fully fit-out enterprise-grade data centers.

Given the rapid changes in the AI technology cycle we are currently witnessing, it is too early to call winners and losers. However, as the dust settles, we believe that DeepSeek's arrival will have a net-positive impact on the AI industry, with the potential to streamline costs, spur continued innovation, and power efficiencies in training and inference.

Interested in having a conversation with Altman Solon about generative AI? Leave your info below.