How Nvidia’s Software Ecosystem (CUDA) Captured the AI Market
How Nvidia’s Software Ecosystem (CUDA) Captured the AI Market
How NVIDIA set out to render video games and ended up owning the infrastructure of artificial intelligence.
The story of NVIDIA's dominance is not the story of a company that saw the AI revolution coming. It is the story of a company that built the right tool for the wrong reason, then watched the world reorganize itself around what they had made.
1993 — The Beginning
On a morning in early 1993, three engineers sat in a booth at a Denny's restaurant in San Jose, California, and decided to start a company. Jensen Huang was thirty years old, a microprocessor designer with a degree from Stanford. Chris Malachowsky and Curtis Priem were engineers from Sun Microsystems. The problem they wanted to solve was unglamorous and specific: personal computers rendered 3D graphics badly. The chip that could fix that did not yet exist. They believed they could build it.
They named the company NVIDIA — derived from invidia, the Latin word for envy. The aspiration embedded in that name was modest by the standards of what would eventually follow. They wanted to build a graphics chip that competitors would envy. Nothing in the Denny's conversation, as far as history records, touched on machine learning, artificial intelligence, or the possibility that their work would one day underpin every significant AI model on the planet.
That is the thing to hold onto as this story unfolds. Almost none of what NVIDIA became was planned.
1993–1998 — The Graphics Wars
The 1990s PC market was a battlefield for graphics. Every major chip maker understood that consumers wanted better visuals, and a handful of companies were racing to deliver them. NVIDIA entered that race as an underdog with limited capital and no finished product. Their first chip, the NV1, shipped in 1995. It was innovative in some respects and wrong in others — specifically, it used quadratic surfaces to render geometry at a time when the rest of the industry was standardizing on triangles. When Microsoft launched DirectX and locked in the triangle as the fundamental unit of 3D rendering, the NV1's approach became a dead end almost overnight.
NVIDIA nearly did not survive it. The company went back to the drawing board, burned through cash, and came out the other side in 1997 with the RIVA 128 — a chip that embraced the industry standard and ran fast enough to matter. It sold over a million units in its first four months. The company had its footing.
Two years later, in 1999, NVIDIA introduced the GeForce 256 and did something audacious with its marketing: they invented a new category. They called it a GPU — a Graphics Processing Unit — and argued that it was categorically different from anything that had come before. The GeForce 256 introduced Hardware Transform and Lighting — T&L, in the shorthand of the field — moving the mathematics of geometry transformation and lighting calculation off the CPU and onto the dedicated graphics chip entirely.
This was not a cosmetic distinction. It established, in a technical sense, what a GPU was for the next two decades of computer graphics. The claim was real. The category name stuck. NVIDIA went public on the Nasdaq that same year, and by the end of 2001 its annual revenue had crossed one billion dollars.
What the GPU actually was, beneath all the gaming marketing, was a chip designed to perform the same mathematical operation — multiply two numbers together — millions of times simultaneously. A CPU was a sequential machine. It was fast, flexible, and built to handle one complex task at a time. A GPU was a parallel machine. It was built to handle thousands of simple tasks at once. For rendering pixels, this was exactly right. For everything else, no one was yet thinking about it.
2004–2007 — The Bet Nobody Noticed
In 2004, a computer science doctoral student at Stanford named Ian Buck was doing something unusual with a pair of NVIDIA graphics cards. Buck had become interested in whether a GPU could be used for general-purpose computation — not for rendering triangles, but for scientific calculation. He had built a programming language called Brook that let developers write code for GPU hardware without going through the graphics pipeline.
His work attracted attention from NVIDIA itself, and in 2004 the company hired him. At NVIDIA, Buck was paired with John Nickolls, the company's director of architecture for GPU computing. The two of them set about turning Buck's research prototype into something real. What they were building was a platform that would let any programmer use NVIDIA's GPU hardware for any kind of parallel computation — not just graphics. They called it CUDA: Compute Unified Device Architecture.
CUDA was announced in 2006 and officially released in 2007. The reception was not a thunderclap. Investors were skeptical of the implementation costs. The scientific computing community was interested but cautious. The mainstream technology press barely noticed. A company known for gaming chips was offering a platform for parallel computation to scientists and researchers, and most people filed it under "interesting but niche."
What CUDA actually did, at its core, was give programmers a way to write in a language close to standard C and have their code run on thousands of GPU cores simultaneously. The technical friction of programming a GPU — which had previously required thinking in the metaphors of graphics, of vertices and fragments and shaders — was stripped away. A physicist who wanted to run a simulation no longer needed to pretend they were rendering a scene. They could just write code that computed things in parallel, and CUDA would handle the translation to the hardware beneath.
2006The year NVIDIA released CUDA — a full six years before the deep learning moment that would prove it indispensable. The gap between the tool and the revolution it enabled is the story.
Around the CUDA platform, NVIDIA began building a library ecosystem. cuBLAS accelerated linear algebra. cuFFT accelerated signal processing. Each library was hand-optimized for NVIDIA's hardware, tuned at a level of depth that took years to get right. Competitors could copy the idea of a GPU computing platform. Copying a decade of hand-tuned libraries was a different proposition entirely.
2009–2012 — The Collision
In 2009, a computer vision researcher at Stanford named Fei-Fei Li released ImageNet — a dataset of more than fourteen million labeled photographs organized into over twenty thousand categories. She had spent years building it, driven by a conviction that computer vision algorithms were being held back not by insufficient cleverness but by insufficient data. ImageNet was her answer to that problem.
For the first two years, ImageNet drew modest interest. The competition Li organized around it — the ImageNet Large Scale Visual Recognition Challenge — attracted entries from computer vision researchers who were using conventional algorithms: hand-engineered feature detectors, support vector machines, the standard toolkit of the field. The best results were good but not transformative. Then 2012 arrived.
At the University of Toronto, a doctoral student named Alex Krizhevsky had been experimenting with deep convolutional neural networks — a class of algorithm loosely inspired by biological vision systems, involving many stacked layers of computation. Neural networks of this kind had been theorized and discussed for decades. The problem was that training them required an enormous amount of computation, and CPUs were simply too slow to make the training practical at the scale that would let the algorithms perform well.
Krizhevsky realized that training a neural network was, at its heart, a problem of parallel mathematics — exactly the kind of computation NVIDIA's GPUs had been designed to accelerate. Working in his bedroom at his parents' house, with two NVIDIA GeForce GTX 580 graphics cards, he trained a deep neural network on the ImageNet dataset over five to six days. The network he built, later called AlexNet, was submitted to the ImageNet competition alongside its co-authors Ilya Sutskever and their advisor Geoffrey Hinton.
AlexNet achieved a top-5 error rate of 15.3 percent. The second-place entry achieved 26.2 percent. The gap — more than ten percentage points — was not an incremental improvement. It was a rupture. The computer vision community had spent years making small gains. AlexNet arrived and rewrote what was possible in a single year. The deep learning era had a starting gun, and it had fired on two consumer graphics cards purchased at a computer store.
The deep learning era had a starting gun. It fired on two consumer graphics cards purchased at a computer store, running software built on a platform released six years earlier for scientific researchers.
2012–2016 — Recognition
NVIDIA understood what had happened faster than almost anyone else. The company had been watching GPU computing expand into scientific and high-performance computing for six years. They had seen researchers use their hardware for molecular dynamics, climate simulation, fluid mechanics. But AlexNet was different. AlexNet suggested that the most interesting application of parallel computing might not be physics. It might be intelligence itself.
The research community flooded in. Within months of the ImageNet result, deep learning papers were multiplying. Each one required training runs. Each training run ran faster on GPUs. Faster on NVIDIA GPUs specifically, because NVIDIA's hardware was what the CUDA ecosystem ran on, and the CUDA ecosystem was where the software lived. The libraries were there. The tooling was there. The community knowledge was there. Switching to a competitor meant leaving all of it behind.
NVIDIA began reorienting the company around this new reality. The Tesla product line — named for Nikola Tesla, with no relation to the electric car company — was repositioned as the dedicated data center compute product. The architecture evolved: the Kepler generation in 2012, Maxwell in 2014, Pascal in 2016. Each generation brought improvements in raw compute, memory bandwidth, and energy efficiency that were tuned increasingly with machine learning workloads in mind. The gaming business remained large and profitable. But the data center business was growing faster, and the executives making decisions at NVIDIA knew why.
Google was paying attention in a different way. Beginning around 2013, the company started developing its own specialized chip for machine learning inference — the Tensor Processing Unit, or TPU. The decision was motivated in part by the recognition that relying entirely on NVIDIA hardware created a dependency that a company of Google's scale and ambition could not accept indefinitely. Building their own silicon was expensive and slow. It was also, as events would prove, the only credible path to independence from NVIDIA's ecosystem. No one else had the resources and the motivation to walk that path.
2016–2022 — The Moat Deepens
The V100, released in 2017, was the first NVIDIA chip designed with explicit awareness that its primary market was AI training. It introduced Tensor Cores — specialized hardware units optimized for the matrix multiplication operations that dominate deep learning computations. The V100 was not a gaming chip with some compute capability bolted on. It was, architecturally, a machine learning chip that happened to be able to run graphics workloads. The inversion was complete.
Research labs and cloud providers began ordering V100s in quantity. A single DGX-1 server — NVIDIA's purpose-built AI training system — packed eight V100s together with high-speed interconnects between them. The price was around $150,000. Universities bought them. Labs bought them. The hyperscalers — Amazon, Microsoft, Google — built entire data center wings around them.
Meanwhile, AMD and Intel were watching. Both companies had GPU products and both companies understood, by the mid-2010s, that they were being left behind in the market that was going to matter most. AMD's response was ROCm — Radeon Open Compute — an open-source software platform designed to let developers run GPU computing workloads on AMD hardware. Intel developed oneAPI, a similar effort to create a unified programming model across its compute products.
Neither gained meaningful traction. The reason was not that the hardware was inferior, though in some benchmarks it was. The reason was the ecosystem. By 2017, a decade of CUDA development had produced a library stack — cuDNN for deep learning primitives, TensorRT for inference optimization, NCCL for multi-GPU communication, and dozens of others — that was simply not available on competing platforms.
A researcher who wanted to switch to AMD hardware would find that their code, if it used any of these libraries directly or indirectly, would not run. Every major deep learning framework — TensorFlow, PyTorch, JAX — had been built with CUDA as its assumed substrate. Replacing CUDA meant rewriting at every layer simultaneously.
The A100, released in 2020, pushed the numbers further. The H100, announced in 2022 and shipping into 2023, was the chip that would become the primary unit of currency in the coming AI buildout. It contained eighty billion transistors, ran at performance levels measured in thousands of trillion operations per second, and cost in the range of thirty to forty thousand dollars per unit. Demand immediately outstripped supply. Waitlists stretched to months. Entire company strategies were shaped around access to H100 allocations.
90%+Share of global AI model training estimated to run on NVIDIA hardware as of 2026. Over five million developers work within the CUDA ecosystem. More than three thousand applications have been GPU-accelerated on the platform.
2022–Present — The Kingmaker Era
When ChatGPT launched in November 2022 and accumulated a million users in five days, the public story was about OpenAI. The infrastructure story was about NVIDIA. Every training run that produced GPT-3, GPT-3.5, GPT-4 had run on NVIDIA hardware. Every inference request that returned a ChatGPT response was being processed on NVIDIA hardware. The most visible product in the history of artificial intelligence was, beneath its interface, a CUDA application.
The market responded accordingly. NVIDIA's stock, which had traded in the range of a hundred to two hundred dollars through much of 2022, began a climb that would take it past a thousand dollars per share and lift the company's market capitalization above three trillion dollars — making it, briefly, the most valuable public company in the world. Jensen Huang, who had worn the same style of black leather jacket to every public appearance for years, became a recognizable figure to people who had never bought a graphics card.
The geopolitical dimension arrived in the same period. The United States government, concerned about the use of advanced AI chips in Chinese military applications, began restricting the export of NVIDIA's most capable hardware to China. The A100 was restricted. The H100 was restricted. NVIDIA developed downgraded variants — the A800 and H800 — designed to meet the export control thresholds while remaining salable. When those too were restricted, the situation forced Chinese AI laboratories into a different posture: they had to figure out how to train competitive models on hardware they could actually obtain, and they had to consider what it would mean to develop alternatives to CUDA that could run on non-NVIDIA chips. The export controls, designed to impede Chinese AI development, inadvertently created the strongest incentive that has ever existed to build a real alternative to NVIDIA's ecosystem.
Whether that alternative materializes remains an open question. Inside the United States, the momentum runs the other direction. Every major AI laboratory — OpenAI, Anthropic, Google DeepMind, Meta AI, xAI — trains its models on NVIDIA hardware. Every major cloud provider has built its AI infrastructure around NVIDIA chips. The largest customers are simultaneously NVIDIA's most important partners and the companies most motivated to reduce their dependence.
The hyperscalers are all developing custom silicon: Google's TPUs, Amazon's Trainium, Microsoft's Maia. None have displaced NVIDIA as the primary training platform. They have carved out specific workloads — inference at scale, particular model architectures — while NVIDIA remains the default for the training runs that define the frontier. And even where the hyperscalers run their own silicon internally, they still purchase NVIDIA hardware by the trainload for their enterprise cloud customers, who have built their own workflows on CUDA and are not prepared to abandon them.
The largest customers are simultaneously NVIDIA's most important partners and the companies most motivated to reduce their dependence. That tension defines the AI hardware market today.
The Shape of the Moat
It is worth being precise about what NVIDIA's advantage actually consists of, because the hardware story — the chips, the transistors, the benchmark numbers — is only part of it. The deeper advantage is the software ecosystem that has accumulated around CUDA over nearly two decades.
CUDA has more than five million active developers. The libraries built on top of it — cuDNN, TensorRT, NCCL, cuBLAS, and many others — have been optimized at a level of depth that takes years to replicate. Every time NVIDIA releases new hardware, the libraries are updated to extract maximum performance from it. Every time a researcher finds a new technique for accelerating training, that technique gets absorbed into the ecosystem and made available to everyone working within it. The flywheel turns in NVIDIA's favor with every cycle.
Switching costs are not just technical. They are cultural. A generation of AI researchers has learned to think in CUDA. The tutorials are written for CUDA. The Stack Overflow answers are written for CUDA. The debugging tools are written for CUDA. A researcher who moves to a competing platform does not just change their hardware. They leave behind a body of community knowledge that took a decade to build.
None of this was the result of a single visionary decision. CUDA was built for scientific computing. The deep learning applications came later, from outside the company, from a doctoral student in Toronto who needed to multiply a lot of numbers simultaneously. The ecosystem grew because researchers found the platform useful and built on it and attracted other researchers who built on it further. The moat is real and deep, but it was dug incrementally, by thousands of people making thousands of decisions, most of whom were not thinking about competitive dynamics at all.
Thirty Years From a Denny's Booth
Jensen Huang still runs NVIDIA. He has led the company for every one of its thirty-three years, through the near-bankruptcy of the NV1 era, through the GPU wars of the late 1990s, through the CUDA bet that most investors did not understand, through the deep learning collision that vindicated it, and through the generative AI moment that turned the company into one of the most valuable enterprises in human history.
He has said, in various forms, that he sees NVIDIA not as a chip company but as an accelerated computing company — that the GPU is not a graphics chip that happens to do math, but a math engine that happens to have started in graphics. That framing, retroactively applied to a history that unfolded more chaotically than any narrative suggests, is nonetheless accurate. The architecture that emerged from those early decisions about parallel processing has proved more general, more durable, and more consequential than anyone in that Denny's booth could have imagined.
The AI industry runs on NVIDIA. The training runs that produce the models you use every day — the language models, the image generators, the code assistants — happen inside data centers filled with NVIDIA hardware, running NVIDIA software, billed by the hour to companies building the AI economy. That infrastructure did not emerge from a plan to build an AI empire. It emerged from a decision to help programmers draw triangles faster. The distance between those two things is the story of NVIDIA. And it is still being written.
Tech Reader Magazine
TechReaderMagazine.com