DeepSeek shakes Wall Street, low-cost training questioned.

On January 28, 2025, Chinese artificial intelligence company DeepSeek caused shockwaves on Wall Street, yet very little information about this small startup based in Hangzhou was found in the public domain. The company has been declining any interviews from both domestic and international media outlets.

DeepSeek sparked concerns about AI stock valuations on Wall Street on Monday, January 27th, with Nvidia plummeting almost 17%, leading to an evaporation of nearly $600 billion in market value, making it the largest drop in U.S. stock market history.

In December 2024, researchers at DeepSeek published a paper on the public website alphaXiv, announcing the upcoming launch of the DeepSeek-V3 model on January 10th, surpassing various domestic and international big model products in multiple data categories. Surprisingly, they achieved this using Nvidia’s non-high-end H800 chip for training, at a training cost of just $5.57 million.

Subsequently, on January 20th, they released the inference model DeepSeek-R1, claiming its performance was on par with OpenAI o1’s official version.

After a week of rampant speculation and discussion, DeepSeek reached the top of the download rankings on Apple’s U.S. App Store. This triggered doubts in the industry regarding the necessity and competitiveness of Nvidia’s fastest and most powerful chips, as well as the exorbitant investments made by tech companies in AI models and data centers.

On Monday, besides chip manufacturers, data center service providers, and nuclear power concept stocks for AI infrastructure, also took a hit, as concerns grew that DeepSeek’s emergence might lead to lower-than-expected spending on future AI infrastructure, chip requirements, and energy needs.

Later on Monday, DeepSeek mentioned that they had restricted overseas user registration due to a “large-scale malicious attack.”

Mainland Chinese media hyped up Monday’s Wall Street upheaval, describing DeepSeek’s actions as reminiscent of a “surprise attack on Pearl Harbor.”

This mysterious company located at Room 1201, Block 1, Huajin International Building, North Ring Road, Gongshu District, Hangzhou, refused any media inquiries.

The Chinese media outlet “21st Century Business Herald” reported, “As they suddenly surged in popularity, DeepSeek chose to ‘deep dive,’ refusing to engage in any form of external communication.”

An investor disclosed to a reporter from “21st Century Business Herald,” “People trying to reach them have faced barriers; recent attempts to make appointments have been unsuccessful.”

A notice on DeepSeek’s “Official Communication 98 Group” stated, “We are currently not engaging in project collaborations, nor providing privatization deployment or related support services; DeepSeek will focus on developing stronger models, stay tuned for more!”

According to a report by the official Chinese media Xinhua News Agency, on the day the R1 model was released, DeepSeek’s founder, Liang Wenfeng, attended a closed-door symposium hosted by Chinese Premier Li Keqiang for entrepreneurs and experts.

Liang’s attendance at the meeting may indicate that DeepSeek’s success is crucial to Beijing’s goal of overcoming export controls from Washington and achieving self-sufficiency in strategic industries such as artificial intelligence.

This meeting was broadcast on the Central Television’s “News Broadcast” program, an essential event of the day.

Inquiry by Da Ji Yuan into DeepSeek’s published papers reveals that the widely cited $5.5 million training cost actually pertains to DeepSeek-V3, not R1. Moreover, even in the case of V3, it only represents a small portion of the actual training cost.

The article states, “The above cost only includes the formal training of DeepSeek-V3 and doesn’t encompass all other costs related to architecture, algorithms, or data from prior research and experiments.”

The report goes on to elaborate on the cost calculation, detailing the hours required and the associated expenses. The cost breakdown emphasized the efficiency and cost-effectiveness of DeepSeek-V3 compared to traditional models.

Stacy Rasgon, Managing Director and Senior Analyst of Semiconductor and Capital Equipment at Bernstein Research, indicated that the figures regarding DeepSeek’s training costs were quite misleading.

He pointed out that DeepSeek-V3, being a “hybrid specialist” model, achieves comparable or superior performance to other large-scale base models while requiring significantly fewer training resources.

The discussion raised concerns that the actual cost of developing the R1 model, if it indeed rivals OpenAI o1’s capabilities, would be considerably higher.

Paul Triolo, Partner at the global consultancy firm DGA Group, mentioned in a Substack publication that the training costs for DeepSeek’s R1 most likely surpass those of V3, and the costs escalate as models such as o4 or o5 from OpenAI or R2 or R3 from DeepSeek are introduced.

Archerman Capital, a U.S.-based capital firm, also questioned the $5.5 million cost narrative. Their analysis emphasized that while DeepSeek’s training cost was likened to one-tenth of Meta’s and one-twentieth of OpenAI’s, the comparison overlooks the pioneering and pioneering nature of Meta and OpenAI’s expenses, which naturally involve more exploration and, therefore, potential waste.

The report aptly noted, “Inappropriate analogy – developing innovative drugs takes ten years and billions of dollars, while developing generic drugs is faster and cheaper. Moreover, the criteria for measuring costs have not been standardized, resulting in significant discrepancies.”

As of the time of writing, DeepSeek has not responded to Da Ji Yuan’s request for comment.