How Copilot Was Built
How Copilot Was Built
On June 2, 2026, Microsoft opened its annual Build developer conference in San Francisco under conditions that were, by the standards of a major tech keynote, genuinely awkward. Two days earlier, on June 1, the company had moved its GitHub coding assistant from a flat monthly subscription to a token-based billing model. The reaction in developer communities was immediate and pointed. Users reported exhausting monthly credit allotments within hours on agentic coding tasks. Comment threads accumulated hundreds of responses. The complaint was less about the dollar amount than about the sudden visibility of a ceiling that had not felt like it existed before.
Build went ahead. Satya Nadella took the stage. There were announcements about agentic systems, about a new context layer called Microsoft IQ, about a GitHub Copilot desktop app in preview. And then, in passing, a sentence that landed differently than the announcements around it: "Come summer, we will be bringing coding to all knowledge work within one Copilot Super App. You're going to have Chat, Cowork, and Code all in Copilot."
No demo. No screenshots. No release date. Just the sentence, and the crowd, and the implication that the thing Microsoft most needed to show the world was not yet ready to be shown.
That gap — between what was promised and what appeared — is a useful place to begin a longer story. Because the product that was not on stage at Build 2026 has a history that runs back nearly a decade, through one of the most consequential research partnerships in the modern technology industry, through a paper published by eight researchers at Google, and through a series of decisions made in Redmond and San Francisco that neither company could have fully anticipated when it made them.
2017 — Eight Authors and an Architecture
In June 2017, researchers at Google Brain published a paper titled Attention Is All You Need. It introduced the transformer architecture — a new way of processing sequences of data that replaced the recurrent neural networks that had dominated the field. The key mechanism was self-attention: the ability of a model to weigh the relevance of every part of an input against every other part simultaneously, rather than processing it one token at a time.
The practical consequences were substantial. Transformer models could be trained in parallel across many processors simultaneously, which meant they could be scaled. More parameters, more data, longer training runs — and the models kept getting better in ways that earlier architectures had not. The paper was not a product announcement. It was a result. What it made possible took several more years to become visible.
The paper was not a product announcement. It was a result. What it made possible took several more years to become visible.
2019 — The Partnership and the Pipes
In July 2019, Microsoft announced a one-billion-dollar investment in OpenAI and the formation of an exclusive computing partnership. OpenAI would port its services to run on Microsoft Azure. Microsoft would become OpenAI's preferred partner for commercializing new AI technologies. The two companies would jointly build Azure AI supercomputing infrastructure of a scale that, at the time, did not yet exist.
The stated ambition was the development of artificial general intelligence. The practical arrangement was more specific: OpenAI needed compute at a scale it could not fund on its own, and Microsoft had both the infrastructure and the commercial incentive to provide it. Azure would become the substrate on which OpenAI's models were trained and served. Every GPT model that followed — GPT-2, GPT-3, and everything after — was trained on Azure supercomputing clusters built in close collaboration between the two organizations.
This is the moment the Copilot lineage becomes possible. Not because anyone used the word Copilot yet, but because Microsoft now had both privileged access to frontier models and the cloud infrastructure to run them at scale. The research and the pipes were in the same building, figuratively speaking, for the first time.
2021 — The Name Is Born
In June 2021, GitHub announced a technical preview of a new coding assistant called GitHub Copilot. It was developed jointly with OpenAI and powered by Codex — a distinct model optimized from the GPT-3 architecture and fine-tuned on source code from public repositories. A developer working in Visual Studio Code could describe what they wanted to accomplish in plain language, and the model would generate a plausible implementation.
GitHub Copilot was, at its core, a proof of concept for something larger. It demonstrated that a large language model could function as a genuine workflow assistant rather than a novelty. It demonstrated that inference at developer scale was operationally viable on Azure. And it gave the word copilot — a second pilot, an assistant rather than an authority — a specific meaning in the context of AI-assisted work.
The name stuck. More than that, it became a frame.
2022 — The Acceleration
In November 2022, OpenAI released ChatGPT. The product reached one million users in five days. It reached one hundred million users in two months, a pace of consumer adoption that had no precedent in the technology industry. The underlying model was GPT-3.5 — capable, fast, and cheap enough to serve at scale.
Inside Microsoft, the effect was to compress a timeline. Plans that had been moving at one speed began moving at another. The question was no longer whether to integrate large language models into Microsoft's core products. The question was how fast, and in what form, and under what architecture. The answer, worked out over the following months, was the Prometheus system: Microsoft's proprietary orchestration layer that combined OpenAI's GPT models with Bing's search index, grounding, citation, and safety infrastructure.
Prometheus was not the model. It was the layer that made the model usable as a product — reliable enough for search, grounded enough to cite sources, moderated enough to deploy to consumers. It was the engineering work that sits between a research result and a shipping product, and it was substantial.
2023 — The Year Copilot Became a Platform
On February 7, 2023, Microsoft launched Bing Chat — a new search experience built on a GPT-4-class model running through the Prometheus system. It was the first time general consumers could interact with a frontier-class language model embedded in a familiar product. The response was significant enough to put Microsoft's search ambitions back in the conversation for the first time in years. One million users joined the waitlist within 48 hours of the announcement.
The early weeks were not without friction. The model, which had an internal name of Sydney that briefly surfaced in public conversations, produced responses that were by turns impressive and unsettling — emotionally volatile in ways that no one had quite anticipated. Microsoft moved quickly to constrain the behavior. The episode was instructive: deploying a frontier model through a consumer product surface created exposure that a research demo did not.
In March 2023, Microsoft announced the integration of GPT-4 into Word, Excel, PowerPoint, Outlook, and Teams under the name Microsoft 365 Copilot. This was the moment the Copilot concept moved from developer tooling into the general enterprise. The same month, Bing Chat's model was confirmed to be GPT-4 — at the time not yet publicly available through OpenAI's own API.
By the end of 2023, Microsoft had consolidated Bing Chat, Bing Chat Enterprise, and Microsoft 365 Chat into a single unified brand: Microsoft Copilot. The word that had started as the name of a code-completion tool for developers now covered search, productivity software, and consumer chat. The brand had become a platform.
The word that had started as the name of a code-completion tool for developers now covered search, productivity software, and consumer chat. The brand had become a platform.
2024 — Expansion and the Hardware Layer
Through 2024, Microsoft pushed the Copilot brand further outward. Copilot Pro launched with memory, personalization, and plugin support for third-party services. Copilot for Microsoft 365 rolled out broadly to enterprise customers, automating workflows across Excel and PowerPoint and introducing features designed to comply with enterprise data governance requirements.
The year also saw Microsoft introduce a new hardware category it called Copilot+ PCs — consumer computers equipped with Neural Processing Units capable of running AI inference locally, without sending data to Azure. The category represented a structural shift: AI capability that had been cloud-dependent by necessity was becoming, for some workloads, a local feature. The implications for billing, for privacy, and for the relationship between hardware and software were not fully resolved, but the direction was established.
By the end of 2024, Copilot had accumulated 15 million paid seats in the enterprise. It had also accumulated complexity. The product was now simultaneously a consumer chat interface, an enterprise productivity layer, a developer tool, a search feature, a hardware category, and a brand stretched across virtually every surface Microsoft owned. The coherence of the whole was less obvious than the ambition of the parts.
2025 — The Partnership Under Tension
The Microsoft-OpenAI partnership had been, from its beginning, a relationship between parties whose interests overlapped substantially but not completely. Microsoft needed frontier models. OpenAI needed compute and distribution. The arrangement served both. But as OpenAI grew into a company with its own consumer products, its own API business, and its own ambitions for enterprise distribution, the partnership began to carry more weight than a purely technical arrangement could comfortably hold.
In its 2024 annual report, Microsoft added OpenAI to its list of competitors — a formal acknowledgment of something the market had been observing informally for some time. The two companies were simultaneously partners, customer and vendor, and rivals for the same enterprise budgets. By 2025, Microsoft had begun integrating models from other providers — including Anthropic's Claude — into Microsoft 365 Copilot alongside OpenAI's GPT models, routing workloads to whichever model performed best for a given task. Azure would remain the infrastructure layer. The model layer was becoming a choice.
The partnership did not fracture. The products continued to ship. But the nature of the relationship had changed from the one announced in 2019, when both parties had described a shared ambition and a natural alignment of interests. The alignment was still real. It was also more complicated than it had been.
2026 — The App That Was Not There
Which brings the story back to San Francisco, June 2, and the sentence that landed differently than the announcements around it.
The Copilot Super App had been reported by Fortune on May 29, based on sources familiar with the plans. It was described as a unified shell combining Copilot Chat, GitHub Copilot, a collaboration tool called Cowork, and a proactive always-on agent called Scout — all under one interface, developed internally under the slogan "Delivering one Copilot." Screenshots had circulated. The expectation going into Build was that the app would be the event's centerpiece.
It did not appear. What appeared instead was Nadella's passing mention — Chat, Cowork, and Code, coming together this summer — and the implicit acknowledgment that what Microsoft most needed to demonstrate had not yet crossed the threshold for a public stage. The announced components were real: Scout was formally introduced as an always-on agent working across Teams, Outlook, and OneDrive. The GitHub Copilot desktop app launched in preview. The agentic infrastructure was on display. The unified surface that would contain it all remained a promise.
The absence was not a failure in any ordinary sense. Products ship when they ship. But the gap between the billing change that opened the week and the super app that closed it with a tease illustrated something real about where the product stands. The billing change was a consequence of Copilot's success — agentic workflows consuming compute at a rate that flat subscription pricing could not absorb. The super app was a response to Copilot's fragmentation — too many surfaces, too many products, too many login states, not enough coherence.
Both problems are the product of the same history. A technology built in seven years of accumulation — transformer architecture, Azure supercomputing, developer tooling, consumer chat, enterprise software, hardware, orchestration layers, branding decisions made under competitive pressure — does not simplify easily. The parts were added for good reasons. Assembling them into a single trustworthy daily workspace is the work that remains.
The billing change was a consequence of Copilot's success. The super app was a response to its fragmentation. Both problems are the product of the same history.
What the Summer Will Show
The 2017 paper that started this chain of events was called Attention Is All You Need. The title referred to a specific technical mechanism — the self-attention computation that made transformers work. But attention, in the ordinary sense of the word, is also what the product history described here has always been competing for: developer attention, consumer attention, enterprise attention, the kind of daily habitual attention that determines which tools people actually use.
The transformer paper introduced an architecture. GitHub Copilot introduced a name and a pattern. Bing Chat introduced the consumer surface. Microsoft 365 Copilot introduced the enterprise layer. The unified brand introduced the platform. Copilot+ PCs introduced the hardware layer. The super app, whenever it arrives, is meant to introduce coherence — to take nine years of accumulation and present it as a single place where work gets done.
Whether it does that is a question the summer will begin to answer. The architecture that made all of this possible was published in 2017 by eight researchers who were solving a different problem entirely. The product that emerges from it in 2026 will be judged by the people who open it every morning and decide, one day at a time, whether it deserves to stay.
GitHub Copilot moved to token billing on June 1. An unnamed enterprise ran up a $500 million AI tab in a single month. The introductory era of AI pricing is over — and the market is doing what markets do. Read "The Meter Is Running" at Tech Reader Magazine.