Note: This is a research note supplementing the book Unscarcity, now available for purchase. These notes expand on concepts from the main text. Start here or get the book.
Inside xAI Colossus: The World’s Largest AI GPU Cluster
A billionaire bought a defunct appliance factory in Memphis, Tennessee, filled it with 200,000 GPUs, and built the most powerful AI training facility on Earth in four months. Here’s what that means for you.
The 122-Day Miracle (Or Madness)
In the summer of 2024, Elon Musk’s AI company xAI did something that made seasoned data center engineers choke on their coffee. They took a decommissioned Electrolux refrigerator factory in Memphis, Tennessee — a building that had recently been making kitchen appliances — and converted it into the single largest AI supercomputer on the planet.
In 122 days.
To appreciate how insane that timeline is, consider that a typical hyperscale data center takes 18 to 24 months to build. Google’s custom TPU facilities take years of planning. Microsoft’s data center projects go through regulatory, environmental, and engineering reviews that would make a medieval cathedral builder nod in sympathetic exhaustion.
Musk’s team did it in about the time it takes to renovate a bathroom in Manhattan.
By September 2024, Colossus was operational with 100,000 Nvidia H100 GPUs humming inside what had been, months earlier, a place that made refrigerators. The symbolism writes itself: a factory that once kept food cold now trains artificial minds. Welcome to the Intelligence Age, where the factories don’t make things — they make thinking.
The Numbers: What’s Actually Inside Colossus
Let’s talk hardware, because the specifications are genuinely staggering.
Phase 1: The Initial 100K (Summer 2024)
The first deployment consisted of 100,000 Nvidia H100 GPUs — each one a chip roughly the size of your hand that can perform 4 petaflops of AI computation. That’s 4 quadrillion calculations per second. Per chip. Multiply that by 100,000 and you get… well, a number so large it stops meaning anything intuitive. So let’s use an analogy: if every human on Earth did one calculation per second, it would take the entire species about 58 days to match what Colossus does in a single second.
The H100s were connected via Nvidia’s InfiniBand networking — 400 gigabits per second per port — enabling the GPUs to share data with latencies measured in microseconds. The entire cluster operated as a single coherent training system, not 100,000 isolated machines.
Phase 2: The Double-Down (Late 2024)
Then, in a move that suggested Musk’s relationship with patience is purely adversarial, xAI doubled the cluster. In 92 days. By early 2025, Colossus was running approximately 200,000 GPUs: a mix of 150,000 H100s, 50,000 of the newer H200s (with 141GB of HBM3e memory each, nearly double the H100’s 80GB), and the first batches of Nvidia’s Blackwell-generation GB200s.
Current State and the Million-GPU Ambition
As of mid-2025, the facility houses over 200,000 GPUs and consumes roughly 250 megawatts of power. To put that in perspective, 250 megawatts is enough electricity to power about 250,000 American homes. Memphis effectively gained a new small city’s worth of power consumption overnight — except this city has no residents, only silicon.
And Musk isn’t done. The stated ambition is to scale Colossus to one million GPUs. At current power-per-GPU ratios, that would require over a gigawatt of electricity — the output of a large nuclear power plant, dedicated entirely to making AI smarter.
The Cost
Nvidia H100s retail for roughly $25,000 to $30,000 each (when you can get them — for most of 2023-2024, the waitlist was longer than the guest list at a royal wedding). At 200,000 units, that’s $5-6 billion in GPUs alone. Add networking equipment, cooling infrastructure, power systems, building renovation, and engineering labor, and conservative estimates put the total Colossus investment north of $10 billion.
For a single facility. In Memphis.
Why Memphis?
This is a surprisingly interesting question. Memphis, Tennessee is not Silicon Valley. It’s not even Austin. So why did the world’s largest AI supercomputer end up in a city better known for barbecue and blues?
Several factors converged:
-
Available real estate: The former Electrolux factory provided a massive industrial shell — hundreds of thousands of square feet of floor space with industrial-grade foundations capable of supporting heavy equipment. Converting existing industrial buildings is dramatically faster than building from scratch.
-
Power availability: Memphis sits on the Tennessee Valley Authority (TVA) grid, one of the largest public power systems in the United States. The TVA operates a mix of hydro, nuclear, natural gas, and renewable generation, with significant available capacity. When you need 250 megawatts fast, you need a utility that can deliver without a three-year interconnection study.
-
Tax incentives: Tennessee offered significant economic development incentives. When a billionaire shows up wanting to invest billions in your state, governments tend to find flexibility.
-
Speed: Memphis city officials approved the project with remarkable velocity. In the race to build AI infrastructure, regulatory speed is a genuine competitive advantage. Some cities take longer to approve a Starbucks renovation.
The choice highlights a broader trend: AI infrastructure is gravitating not to where the talent lives, but to where the power, space, and regulatory willingness exist. The brains working on AI might live in San Francisco, but the actual thinking happens increasingly in places like Memphis, Abilene (Texas, for Stargate), and rural Oregon.
The Competition: How Colossus Stacks Up
xAI Colossus doesn’t exist in isolation. It’s one entry in what has become the most expensive arms race since the Cold War. Here’s how the major GPU clusters compare:
| Facility | GPUs | Investment | Power | Status |
|---|---|---|---|---|
| xAI Colossus | 200,000+ (target: 1M) | ~$10B+ | 250 MW | Operational |
| Meta’s Clusters | 350,000+ H100s | ~$50B+ | Distributed | Operational |
| Stargate (OpenAI/SoftBank) | 450,000+ GB200s planned | $500B over 4 years | 8+ GW planned | Phase 1 operational |
| Google TPU Clusters | Custom TPUs (v5p+) | ~$75B+ CapEx (2025) | Distributed | Operational |
| Microsoft Azure | 300,000+ GPUs | ~$80B CapEx (2025) | Distributed | Operational |
A few things stand out.
Colossus is the largest single-site cluster. Meta has more total GPUs, but they’re spread across multiple data centers. Stargate will eventually dwarf everything, but it’s a multi-year, multi-site buildout. Colossus concentrated more AI computing power in one building than had ever existed anywhere before — and did it in a timeline that would be impressive for assembling IKEA furniture.
The Stargate Project is the elephant in the room. At $500 billion over four years, with $100 billion deployed immediately, the OpenAI/SoftBank/Oracle venture announced in January 2025 makes Colossus look like a warmup. Five data center sites are already announced, with over 8 gigawatts of planned power capacity — the equivalent of eight nuclear reactors dedicated to AI. When it’s fully built, Stargate will make Colossus look like a laptop.
Google plays a different game. Rather than buying Nvidia’s GPUs, Google designs its own AI chips (TPUs — Tensor Processing Units). This vertical integration means Google’s infrastructure doesn’t show up neatly in “GPU count” comparisons, but its training capacity is formidable. The TPU v5p pods connect 8,960 chips in a single training cluster.
The total investment is historically unprecedented. Combined hyperscaler capital expenditure on AI infrastructure is projected to exceed $600 billion in 2026 alone. The entire Apollo program cost $280 billion in today’s dollars. We’re spending two Apollos per year building the infrastructure of machine intelligence.
The Power Problem: Colossus Is Thirsty
Here’s where the story gets uncomfortable.
Colossus consumes 250 megawatts. Scale it to one million GPUs and you’re looking at over a gigawatt. And Colossus is just one facility among dozens being built or planned worldwide.
The International Energy Agency projects global data center electricity consumption will reach 945 terawatt-hours by 2030 — roughly 3% of all electricity generated on Earth. AI workloads, currently 10-20% of data center energy, are expected to rise to 35-50%.
Memphis found this out the hard way. When xAI’s power demands materialized, local residents raised legitimate concerns about grid reliability, electricity prices, and the environmental impact of powering a facility that benefits a private company but draws from shared infrastructure. The TVA had to scramble to ensure that keeping Colossus fed didn’t mean brownouts for everyone else.
This is the compute concentration problem in miniature: one company’s AI ambitions consume resources that belong to a community. The electrons flowing into Colossus are electrons not flowing into homes, hospitals, and schools. The power grid is a shared commons; Colossus is a private enclosure.
And Memphis is just the beginning. As every major tech company races to build bigger clusters, the aggregate power demand threatens to overwhelm electrical grids worldwide. Something has to give — either AI scales back (unlikely), new energy sources come online (fusion, the perennial “20 years away” technology, is now attracting tens of billions in investment), or communities bear the cost of powering private intelligence factories.
The Concentration Problem: When Compute Becomes Power
Let’s zoom out from the technical specs and ask the question that matters: what does it mean when one company can build a facility like Colossus?
Consider the facts:
- The United States controls approximately 75% of the world’s AI supercomputing capacity.
- Within the US, that capacity is concentrated in fewer than a dozen companies.
- Within those companies, the decision of what to train, how to train it, and who gets access is made by a handful of executives.
- Nvidia, which manufactures the GPUs powering nearly all of this, commands 80-95% of the AI chip market.
We have taken the most transformative technology in human history and concentrated its means of production more tightly than oil ever was. At least oil was geographically distributed — you could find it in Texas, Saudi Arabia, Venezuela, Norway. Advanced AI compute exists, meaningfully, in a few buildings owned by a few billionaires.
Colossus exemplifies this. It was built on one man’s timeline, with one man’s capital, to serve one man’s company’s priorities. The 200,000 GPUs inside that Memphis facility represent more raw AI training capacity than most nations possess. xAI uses it to train Grok, its AI assistant. What Grok learns, what it’s good at, what values it embodies — these are decisions made by xAI, not by Memphis, not by Tennessee, not by the American public, and certainly not by humanity as a whole.
This isn’t a critique of Musk specifically. Meta, Google, Microsoft, and OpenAI are all doing the same thing at comparable scales. The critique is structural: we are building the infrastructure of intelligence as private property, and that’s a choice, not an inevitability.
Roads are public. The electrical grid is regulated as a utility. The internet backbone was built with public funding before private companies commercialized it. We made those choices because we understood that certain infrastructure is too important to leave entirely in private hands.
Compute — the infrastructure of intelligence — deserves the same conversation.
The Alternative: What If Colossus Were Public?
This is where the Unscarcity framework enters the picture.
The book argues for Universal Basic Compute — the idea that every citizen should receive a guaranteed allocation of AI processing power, much as every citizen receives access to roads, clean water, and (in most civilized countries) healthcare.
Imagine a world where the 200,000 GPUs in Colossus weren’t xAI’s private arsenal but a shared public utility. Every resident of Memphis — every resident of Tennessee — every American — receives a compute allocation. A slice of that immense processing power, delivered as reliably as electricity, to do with as they see fit.
What would people do with it?
- A single mother in Memphis could run an AI agent that handles her tax filing, legal questions, and job applications — services that currently cost hundreds of dollars in professional fees.
- A small business owner could deploy AI-powered inventory management, customer service, and marketing — capabilities currently available only to companies that can afford enterprise software.
- A student could train a specialized model on their local community’s needs — a hyper-local AI that understands Memphis neighborhoods the way Google understands web queries.
- A cooperative of farmers could pool their allocations to run agricultural AI — optimizing crop rotation, predicting weather impacts, negotiating better prices.
This isn’t science fiction. Research institutions already allocate compute quotas to scientists. National cloud initiatives in Singapore, the UAE, and Saudi Arabia are building sovereign AI infrastructure for public access. The concept of “compute as public utility” is emerging in policy discussions worldwide.
The difference between Colossus-as-private-asset and Colossus-as-public-infrastructure is the difference between feudalism and democracy applied to the Intelligence Age. In one model, a small number of technology lords control the means of intelligence production and everyone else rents access. In the other, intelligence infrastructure is treated like roads — publicly provisioned, universally accessible, and understood as foundational to participation in civilization.
The Musk Paradox
There’s a delicious irony here that deserves acknowledgment.
Elon Musk — the man who built Colossus — is the same man who has repeatedly predicted “universal high income” as a consequence of AI development. He’s told audiences at Davos, at the Riyadh tech summit, and on his own platform that AI will eventually deliver abundance so great that “money won’t matter.”
And yet. The infrastructure that supposedly leads to this universal abundance is being built as aggressively private property. Colossus doesn’t share. Grok doesn’t run on public compute. The 250 megawatts feeding that Memphis facility serve xAI’s shareholders, not Memphis’s residents.
This is the fundamental tension: the people building the technology of abundance are building it using the institutions of scarcity. Private ownership. Proprietary systems. Competitive moats. Winner-take-all dynamics.
The Unscarcity framework doesn’t demand that Musk donate Colossus to the public (he won’t, and forcing him would create other problems). Instead, it proposes a transition architecture — an EXIT Protocol that gradually converts private compute infrastructure into public utility, the same way the telephone network evolved from private monopoly to regulated utility to essential service.
The specifics of how this transition works — the Civic Service requirements, the Impact economy, the Diversity Guard preventing any single faction from capturing the system — are laid out in the book. But the principle is simple: infrastructure this important cannot remain this concentrated.
What Colossus Tells Us About the Future
Colossus is not just a data center. It’s a signal.
It tells us that the means of intelligence production can be built, at world-historical scale, in months rather than years. It tells us that the bottleneck isn’t engineering know-how — it’s capital, power, and political will. It tells us that the future of AI will be shaped not by who writes the best algorithms, but by who controls the physical infrastructure on which those algorithms run.
And it asks us a question: when we build the factories of the Intelligence Age, who should own them?
The answer to that question will determine whether the age of artificial intelligence becomes an age of universal capability or an age of unprecedented concentration. Whether every human gets a compute allocation — a share of the cognitive infrastructure of civilization — or whether a handful of companies run the thinking machines and everyone else gets whatever trickles down.
Colossus sits in Memphis, Tennessee, in a building that used to make refrigerators. It is simultaneously the most impressive engineering achievement of the decade and the most vivid illustration of everything the Unscarcity framework argues must change.
The factories of the Intelligence Age are being built right now. The question isn’t whether they’ll be powerful — they already are. The question is whether they’ll be ours.
Further Reading
- Compute Clusters: The Factories of the Intelligence Age — Full ranking of the world’s largest GPU clusters
- Universal Basic Compute (UBC) — The case for compute as a public utility
- Elon Musk’s Universal High Income — When the richest man says money won’t matter
- Infrastructure Libertarianism — The ideology driving private compute buildouts
- The Foundation — The Unscarcity blueprint for universal infrastructure
xAI built the world’s largest AI supercomputer in a refrigerator factory. The Unscarcity book asks: what if we built it for everyone? Get the book and explore the blueprint.