DeepSeek, the AI Model That Was Built While the World Looked Elsewhere
DeepSeek, the AI Model That Was Built While the World Looked Elsewhere
By Aaron Rose · Tech Reader Magazine · June 23, 2026
Wasn't Built Overnight
The story the technology industry told itself about DeepSeek was that it arrived suddenly — a shock, an upset, a disruption from an unexpected direction. Marc Andreessen called it "AI's Sputnik moment," and the phrase spread everywhere because it captured something real: the sensation of looking up and discovering that a competitor you had not taken seriously was already in orbit.
But the Sputnik analogy contains its own correction. The Soviet space program did not materialize overnight. It was built over years, in parallel with American efforts, by people who were watching the same scientific literature and drawing their own conclusions. The shock was not the satellite. The shock was the realization that parallel work had been happening all along, largely unobserved.
DeepSeek is the same story. The company was founded in July 2023, but its roots run back further — to a hedge fund accumulating chips before the export ban, to a government industrial policy launched in 2017, to a decade of Chinese AI investment that tracked American milestones without ever announcing itself as a race. What follows is a reconstruction of both timelines, running in parallel, to show what was being built in Hangzhou while the industry was watching San Francisco.
2015 — Two Starting Lines
OpenAI was incorporated in San Francisco in December 2015, backed by a group of tech investors with a stated mission of developing AI for the benefit of humanity. It was structured as a nonprofit — a form that would not survive contact with the capital requirements of frontier model training.
That same year, China's State Council published Made in China 2025, the first central government document to identify AI as a national strategic priority. The plan was industrial, not academic — it named AI as essential to manufacturing competitiveness and national development. The policy infrastructure for what would become DeepSeek began there.
Two starting lines, one year. Neither party was watching the other particularly closely. OpenAI's founding team was focused on long-term safety research and foundational questions about AI alignment. China's State Council was focused on something more immediate: reducing dependence on foreign technology in strategic industries and building domestic capability across the full manufacturing stack. These were genuinely different projects in 2015. By 2025, they had converged on the same problem.
2017 — The Paper and the Plan
Eight researchers at Google published a paper called Attention Is All You Need in 2017, introducing the transformer architecture. The paper did not make headlines outside research circles, but it would become the technical foundation of every major language model that followed, including ChatGPT, GPT-4, and DeepSeek-R1.
That same year, China's State Council issued the New Generation Artificial Intelligence Development Plan — a comprehensive national strategy targeting AI leadership by 2030. The plan designated Baidu, Alibaba, Tencent, iFlytek, and SenseTime as national AI champions, each assigned a strategic domain. Local governments began building AI research parks and offering subsidies to attract talent, and the plan reoriented incentives for Chinese engineering graduates: staying home became more attractive than emigrating to Stanford or MIT.
That same year, China's State Council issued the New Generation Artificial Intelligence Development Plan — a comprehensive national strategy targeting AI leadership by 2030.
The transformer paper and China's AI development plan both arrived in 2017, and neither side fully understood the significance of what the other had done. American researchers were focused on what the transformer enabled technically. China's government was focused on what state coordination could enable industrially. The transformer would eventually give Chinese researchers the same foundational architecture that American labs were building on, while the state plan would give those researchers a funding environment and institutional support structure that no American startup could replicate.
Also in 2017, a quantitative hedge fund in Hangzhou had completed a transition: by late that year, most of High-Flyer's trading was managed by AI systems. The fund's co-founder, Liang Wenfeng, had been applying machine learning to financial markets and was watching the broader AI research landscape with the attention of someone who understood what compounding capability looked like. He was not building a language model yet. But he was learning what these systems could do — and more importantly, he was learning how to compete in zero-sum, adversarial environments where compute efficiency was survival.
The Institutional Ecology of Chinese AI
To understand DeepSeek, you have to understand the institutional ecology that made it possible — a set of overlapping incentives, subsidies, and structural advantages that no American lab could replicate, and that no American observer fully appreciated until R1 was already in the App Store.
The New Generation Artificial Intelligence Development Plan of 2017 is often described as a central command — Beijing telling the market what to do. This is misleading. The plan was more like a signal than a directive. It told provincial governments, local development zones, and state-owned enterprises that AI was a priority, which unlocked a cascade of subsidies, tax breaks, and infrastructure investments that no single company could have obtained on its own.
The practical effect was this: if you were a Chinese AI researcher or entrepreneur in 2018, you had access to subsidized compute — local governments in Hangzhou, Shenzhen, Beijing, and Shanghai offered grants that effectively covered hardware costs, allowing startups to acquire GPUs at thirty to fifty percent of market price through these programs, with the expectation that they would locate their operations in the sponsoring city.
You had access to below-market real estate, as AI research parks offered office space at fractions of market rates, often in prime urban locations, with the explicit goal of clustering talent and creating "AI ecosystems."
You had access to talent retention incentives — the plan included provisions for tax benefits, housing subsidies, and research grants for Chinese engineers who stayed in China rather than emigrating to the US, which reversed a decades-long brain drain. By 2020, the proportion of Chinese AI PhDs remaining in China had risen from roughly forty percent to over seventy percent, according to the China Institute for Science and Technology Policy.
And you had access to domestic market data — Chinese tech companies like Baidu, Alibaba, and Tencent operated at a scale that no Western company could match in areas like e-commerce, logistics, mobile payments, and social media, creating training data ecosystems that were both vast and distinct from Western market data.
The practical effect was this: if you were a Chinese AI researcher or entrepreneur in 2018, you had access to subsidized compute.
High-Flyer and later DeepSeek benefited from all of these, but they did not depend on them in the way that the national champions did. Liang's hedge fund had its own capital. He didn't need local government grants to buy A100s. But the availability of these subsidies created a broader ecosystem of AI talent, infrastructure, and research culture that made DeepSeek's rapid scaling possible. When Liang needed to recruit the best engineering graduates from Tsinghua and Zhejiang University, he was hiring into a field that had been legitimated, subsidized, and prestige-endowed by the state for nearly a decade.
This is where DeepSeek diverges from the national champion model. Baidu, Alibaba, and Tencent were product companies with revenue targets, quarterly earnings, and shareholder expectations. Their AI research was always subordinate to product roadmaps. If a research project didn't lead to a feature or a revenue stream, it was deprioritized.
High-Flyer had no product roadmap. It had only one objective: generate returns for its investors through quantitative trading. But Liang saw something that the product companies didn't: the convergence of AI capability with computational scale.
He realized that the same techniques that made trading algorithms profitable — pattern recognition, optimization under uncertainty, adversarial adaptation — could be applied to language models.
And he realized that the institutional form of a hedge fund — flat, secretive, meritocratic, and ruthlessly focused on edge cases — was better suited to frontier AI research than the bureaucratic structures of the tech giants.
Liang realized that the institutional form of a hedge fund — flat, secretive, meritocratic, and ruthlessly focused on edge cases — was better suited to frontier AI research than the bureaucratic structures of the tech giants.
High-Flyer's culture was built around treating compute as a scarce resource; in quantitative trading, every millisecond of latency and every watt of power matters, and this discipline translated directly to model training, conditioning DeepSeek's engineers to treat compute as something to be allocated optimally, not consumed wastefully.
High-Flyer was also built around adversarial thinking — trading is zero-sum, you win by finding edges that others miss, and DeepSeek's approach to AI — identify the constraint, exploit the asymmetry, move before competitors react — was hedge-fund thinking applied to research.
Additionally, it was built around the absence of a quarterly product cycle; High-Flyer had no customers, no feature requests, and no roadmap deadlines, and DeepSeek inherited this, with the only deadline being the next research breakthrough, allowing for long-horizon bets — like the 2021 chip accumulation — that no product company could have justified to its board.
And finally, it was built around secrecy as strategy; hedge funds don't publish their trading algorithms, and DeepSeek, despite eventually open-sourcing its models, operated in near-total secrecy before R1's launch, not as Chinese state secrecy but as competitive secrecy — the same logic that governs any arbitrage firm. You don't tell your competitors what you've found until you've already exploited it.
The hedge fund culture that shaped DeepSeek was not an accident. It was the source of its institutional advantage — a culture of efficiency, adversarial thinking, and long-horizon bets that no product company could replicate.
The hedge fund culture that shaped DeepSeek was not an accident. It was the source of its institutional advantage.
2019 — Capital and Chips
Microsoft invested one billion dollars in OpenAI in 2019, converting a nonprofit research lab into a commercial juggernaut with the resources to pursue frontier model training at scale. The deal included a provision for Microsoft to receive a share of OpenAI's commercial revenue and gave OpenAI access to Azure's computing infrastructure.
That same year, China's AI champion framework was operating at scale. Baidu, Alibaba, and Tencent were each running substantial AI research operations, backed by government policy and domestic market scale. Liang Wenfeng, now CEO of High-Flyer, began presenting at industry conferences, describing his conviction that Chinese AI companies needed to stop following and start building original work.
The Microsoft-OpenAI deal restructured the economics of frontier AI. It demonstrated that building at scale required billions, not millions, and that the path to frontier capability ran through cloud infrastructure. This was the moment American AI became inseparable from American capital markets.
China's trajectory was different in structure, though not in ambition. The major tech companies had government backing, domestic market protection from Western competitors, and years of accumulated AI investment. They also had, Liang would later argue, a fundamental cultural problem: they were optimized for fast-following, not original research.
His critique of Chinese AI in this period was pointed. The country had capital and talent, he believed, but lacked the institutional confidence to pursue genuine innovation rather than adaptation of what American labs had already done.
What Liang did not say publicly — but what he was already acting on — was that the institutional form mattered. High-Flyer was not a product company. It was an arbitrage firm. Its culture was built around finding edges, exploiting asymmetries, and treating compute as a scarce resource to be allocated with surgical precision. That culture would shape DeepSeek more than any government policy.
2021 — The Chip Bet
OpenAI released GPT-3 in 2021, demonstrating at scale that language models could generalize across tasks in ways that previous architectures could not. The paper circulated widely in research communities globally, including in China. The capability gap between the leading American labs and everyone else was visible and documented.
That same year, Liang Wenfeng began accumulating NVIDIA A100 GPUs — thousands of them — using High-Flyer's capital. To people who knew him, it looked like an unusual hobby for a hedge fund manager. It was not a hobby. He had concluded that the Biden administration would eventually restrict advanced chip exports to China, and that whoever had the chips before the ban would hold a structural advantage afterward. His timing was precise.
By the time the US export ban took effect, SemiAnalysis estimated that High-Flyer and DeepSeek had accumulated approximately fifty thousand NVIDIA A100 units. That stockpile became the computing foundation for everything that followed.
The chip accumulation is the least-discussed and most consequential decision in DeepSeek's history. Everything that came after — the training runs, the efficiency innovations, the model releases — was possible because the hardware was already in Hangzhou before Washington locked the door. The export controls that were supposed to slow China's AI development had been anticipated and circumvented, not through any violation, but through the straightforward logic of buying before the restriction arrived.
This is where the parallel timelines diverge in a way that matters. American labs were building on a model of unconstrained scaling: more GPUs, more parameters, more money. Liang was building on a model of constrained optimization: fixed hardware, fixed supply, maximized efficiency. He did not choose the constraint. But he recognized it before anyone else did, and he built an organization around it.
The 2021–2022 chip accumulation was not just a hardware purchase. It was a strategic arbitrage — buying an asset whose value was about to be redefined by regulation. Liang saw what the US government was going to do before the US government had fully decided to do it. And he acted on that insight with the discipline of a trader, not the caution of a researcher.
Liang's 2021–2022 chip accumulation was not just a hardware purchase. It was a strategic arbitrage — buying an asset whose value was about to be redefined by regulation
2022 — The World Changes
ChatGPT launched in November 2022. One hundred million users in two months. The product demonstrated that a language model trained on internet text could hold a coherent conversation, answer questions, write code, and draft documents well enough for everyday use. The public AI era began.
That same year, the Biden administration announced sweeping export controls on advanced AI chips to China, restricting NVIDIA's A100 and H100. The controls were intended to limit China's ability to train frontier models. For Liang Wenfeng, the timing confirmed what he had anticipated. High-Flyer's stockpile was already assembled. The foundation was in place.
ChatGPT's launch was a demonstration of what was possible. China's AI research community understood immediately what the transformer had produced at scale, and understood equally well what the chip restrictions meant for anyone who hadn't already secured hardware. The combination of those two signals — a publicly demonstrated capability ceiling and a supply constraint — accelerated Liang's timeline considerably. By April 2023, High-Flyer had announced it would pursue artificial general intelligence research. By July 2023, DeepSeek existed as an independent company.
The organizational choice was significant. DeepSeek was not spun out of a tech giant. It was spun out of a hedge fund — a firm with no product roadmap, no customer base, and no quarterly revenue targets. It had only one objective: build AGI. That singular focus, and the culture of adversarial efficiency that came with it, would distinguish DeepSeek from every other Chinese AI lab.
2023 — DeepSeek Arrives
OpenAI released GPT-4 in 2023. Anthropic released Claude. Meta released Llama as open weights. The field fractured into frontier closed models and a growing open-source ecosystem, with American labs defining both. The capital requirements for frontier training were now estimated in the hundreds of millions of dollars.
That same year, DeepSeek released its first model, DeepSeek Coder, in November, followed by the DeepSeek-LLM series in December. The company recruited from top Chinese universities, ran a flat organizational structure, and explicitly rejected the copycat approach Liang had criticized in Chinese AI. The stated goal was AGI, not an application layer built on someone else's model.
DeepSeek's founding philosophy was unusual for a Chinese AI company in ways that its later success would obscure. Liang Wenfeng was not interested in building a ChatGPT wrapper or a Llama fine-tune. He was interested in the underlying research — new architectures, new training methods, new ways of achieving performance under hardware constraints that his American competitors did not face.
The chip restriction that was supposed to be a ceiling became, in his framing, a design constraint that produced better engineering. This is the intellectual core of DeepSeek's approach: constraint as a creative force. American labs had the luxury of throwing hardware at problems. DeepSeek had to solve problems with hardware they already owned. That asymmetry produced a different set of research priorities — efficiency, sparsity, inference-time reasoning — that would eventually become competitive advantages.
American labs had the luxury of throwing hardware at problems. DeepSeek had to solve problems with hardware they already owned.
2024 — The Proof of Concept
OpenAI released GPT-4o and then the o1 reasoning model in 2024. The o1 represented a new paradigm: a model trained to reason through problems step by step before responding. The inference-time compute strategy was presented as a significant advance. It was also, as it turned out, something DeepSeek was working on independently.
That same year, DeepSeek released V2 in May, which gained significant traction in China for its cost efficiency — outperforming models from Baidu, ByteDance, Tencent, and Alibaba at a fraction of the price. In December, DeepSeek released V3, trained in two months for a reported $5.6 million, compared to the $100 million OpenAI spent on GPT-4 in 2023. The efficiency gap was documented and verifiable. The cost differential, not just the capability, is what changed the conversation.
DeepSeek-V3's release in December 2024 was the first genuine signal that something structurally different was happening. Not better in some marginal benchmark sense, but different in kind: a frontier-class model trained with one-tenth the compute of comparable American models, using hardware that was already two generations behind what NVIDIA was producing for export. The efficiency was not accidental. It was the product of architectural innovations developed specifically because the hardware ceiling was fixed.
DeepSeek's Mixture of Experts architecture, or MoE, was based on the insight that most tokens in a transformer don't need to activate all parameters; by activating only a subset of experts per token, the architecture reduces FLOPs without reducing effective model size.
The Multi-head Latent Attention mechanism, or MLA, compressed key-value cache, significantly reducing memory bandwidth requirements during inference.
And Reinforcement Learning with Verifiable Rewards, or RLVR, replaced RLHF with objective verification — math proofs, code execution, rule-based validation — to train reasoning chains, eliminating the cost and subjectivity of human preference labeling.
These were not marginal optimizations. They were architectural rethinks made necessary by the constraint of fixed hardware. American labs, with access to unlimited H100s, had little incentive to pursue them. DeepSeek had every incentive.
These were not marginal optimizations. They were architectural rethinks made necessary by the constraint of fixed hardware.
January 2025 — The Sputnik Moment
DeepSeek-R1 launched on January 20, 2025 — the same day Donald Trump was inaugurated for his second term. Whether the timing was deliberate is debated. What is not debated is the effect.
Within a week, DeepSeek's app had topped the US App Store, displacing ChatGPT. NVIDIA's stock dropped seventeen percent in a single session, erasing nearly six hundred billion dollars in market capitalization — the largest single-day loss in the history of any company on the US stock market. Investors who had built positions on the premise that frontier AI required frontier hardware were confronted with evidence that the premise might not hold.
R1 matched OpenAI's o1 reasoning model on key benchmarks while costing roughly ninety percent less per query. It was open-weight, available under the MIT license, and could be run locally on hardware that consumer users already owned.
The distilled versions were being installed on Raspberry Pi systems within days of release. The grassroots adoption pattern — cheap, local, freely modifiable — was the opposite of the closed, subscription-based model American labs had built their businesses around.
The grassroots adoption pattern — cheap, local, freely modifiable — was the opposite of the closed, subscription-based model American labs had built their businesses around.
The geopolitical response was immediate. Multiple US states, Australia, Taiwan, South Korea, Denmark, and Italy moved to ban or restrict DeepSeek, citing data privacy and national security concerns.
The US Navy prohibited the application outright. The concerns were real — all user data is stored in China, and the model enforces Chinese government censorship policies on politically sensitive topics — but the bans also reflected something else: the recognition that Chinese AI had arrived in the mainstream Western market as a competitor, not a curiosity.
What made R1 different from V3 was not just capability — it was inference efficiency.
OpenAI's o1 achieved reasoning by spending substantial compute at inference time, generating long chains of thought before producing a final answer, which made o1 expensive to run. DeepSeek-R1 achieved comparable reasoning performance with dramatically lower inference costs, because its training approach — reinforcement learning with verifiable rewards — had already internalized the reasoning patterns.
The cost advantage was not just in training. It was in deployment. And that changed the economics of the entire industry.
The Open-Source Strategy as Geopolitical Maneuver
DeepSeek's decision to release its models under the MIT license is often framed as a philosophical commitment to openness. It was not. It was a calculated geopolitical and commercial strategy.
The logic was clear. Unlike the US, where AWS, Azure, and GCP capture API revenue from frontier models, China's cloud providers are smaller and less integrated with the AI stack. There was no domestic SaaS revenue to protect.
Open-sourcing also built a global ecosystem: when DeepSeek released its models under the MIT license, it invited the entire global developer community — including American researchers — to build on its work, and every fine-tune, every optimization, every application built on DeepSeek's architecture became a free contribution to DeepSeek's ecosystem.
The strategy also undermined the American closed-model business model — OpenAI's valuation, Anthropic's funding, and Microsoft's AI strategy all depend on the premise that frontier models would be rented, not owned, and DeepSeek's open-weight release undercut that premise.
Suddenly, a frontier-class model was available for free, runnable locally, and modifiable without restriction. And finally, the strategy created diplomatic friction: US attempts to ban DeepSeek were complicated by the fact that the model was open-source. You can ban an app. You can't ban a Git repository. The distributed nature of open-source distribution made the geopolitical response both slower and less effective than it would have been against a closed API.
Suddenly, a frontier-class model was available for free, runnable locally, and modifiable without restriction.
The open-source strategy was not generosity. It was institutional warfare — using the norms of the open-source community to disrupt the commercial assumptions of American AI. And it worked.
What the Parallel Timelines Tell Us
The side-by-side history reveals something that the Sputnik framing obscures.
DeepSeek was not a surprise move in a game America was winning. It was the visible endpoint of a decade of parallel construction — one track building on venture capital and cloud infrastructure, the other building on state industrial policy, hedge-fund discipline, chip stockpiles, and a deliberate strategy of ecosystem disruption.
The chip restrictions that were supposed to constrain Chinese AI development produced, in DeepSeek's case, a different outcome: an engineering team that had to solve efficiency problems that their American competitors had no incentive to address.
When you cannot simply buy more NVIDIA H100s, you innovate around the constraint. The MoE architecture, the MLA attention mechanism, the reinforcement learning approach to reasoning, the inference-time efficiency — these were not accidental discoveries. They were the product of working within limits that the unconstrained competition did not face.
Liang Wenfeng said it plainly in a July 2024 interview, months before R1 launched: "We do not have financing plans in the short term. Money has never been the problem for us; bans on shipments of advanced chips are the problem." And then, six months later, his team demonstrated that the chips they already had were enough. Not because the ban didn't matter, but because the constraint had forced the engineering.
The Stanford AI Index 2026 concluded that Chinese companies had "effectively closed" the AI performance gap with their US rivals.
That is a significant statement from a significant source. It describes not a moment but a trajectory — one that was building through every year documented in this timeline, visible in retrospect, largely unobserved in real time.
The Institutional Legacy
What DeepSeek revealed was not just a model. It revealed an institutional form — a way of building frontier AI that American labs had not anticipated, and that they may not be able to replicate.
The combination of state subsidies, hedge-fund discipline, chip stockpiling, and open-source disruption is one that no American lab can easily assemble.
Meanwhile, the combination of unconstrained scaling, cloud rent-seeking, and quarterly earnings may be systematically disadvantaged in a world of hardware constraints and open-source competition.
The export ban that was supposed to protect American AI leadership may have, in fact, produced the only institution capable of challenging it.
By restricting chip supply, the US government created a design constraint that forced DeepSeek to become more efficient, more innovative, and more resilient than any American lab. The ban was not a ceiling. It was a forge.
The Question That Remains
What happens when the constraint is removed? If the export ban eventually relaxes — or if China's domestic chip production catches up — will DeepSeek's efficiency culture persist, or will it revert to the American pattern of more hardware, more parameters, more money?
Liang's answer, in the same July 2024 interview, was telling: "We are the only team in China that cannot get H100. But we are also the only team whose engineering culture is built around making do with what we have."
That culture may be the real innovation — not just the architecture, not just the training cost, but the organizational discipline of treating compute as a finite resource to be optimized, not an infinite resource to be consumed.
If that culture survives the arrival of abundance, DeepSeek will remain a different kind of AI company. If it doesn't, it will become what every American lab already is: a scaling machine with no off switch.
The next few years will tell us which version wins. But the institutional ecology that produced DeepSeek is not going away. The subsidies will persist. The talent pipeline will continue to flow. The chip stockpiles — whatever remains of them — will be leveraged. And the open-source ecosystem that DeepSeek seeded will continue to grow, regardless of US policy.
The other track was always there. It just took a Sputnik moment to make people look up. Now that they've looked, the question is whether they can see what's still being built — and whether they can build their own version of the institutional discipline that made DeepSeek possible.
Coming Next: The End of Scaling
The data wall, the finite limits of pre-training, and why DeepSeek's efficiency-first approach may be better positioned for the post-scaling era than any American lab. Coming soon at Tech Reader Magazine.
Tech Reader Magazine
TechReaderMagazine.com