Note: This is a research note supplementing the book Unscarcity, now available for purchase. These notes expand on concepts from the main text. Start here or get the book.
Compute Clusters: The Factories of the Intelligence Age
How AI training facilities became the oil refineries of the 21st century—and why your grandchildren might receive a compute allocation alongside their birth certificate.
What Is a Compute Cluster, Really?
Let’s start with the basics, because “compute cluster” sounds impressively technical but obscures something remarkably simple. A compute cluster is just a lot of computers working together on the same problem.
That’s it. The magic is in the “working together” part.
When you train a large language model like GPT-4 or Llama 4, the task is far too massive for any single computer—even an obscenely powerful one. You need to split the work across thousands of processors that constantly communicate, share results, and coordinate their efforts. If one machine calculates something, every other machine needs to know about it immediately. The lag between them must be measured in microseconds, not seconds.
Think of it like building a pyramid. One person with a wheelbarrow would take millennia. A thousand people need to coordinate—who’s carrying what, where’s the next block going, don’t drop that on Steve. A compute cluster is that coordination problem, solved at the speed of light.
The processors doing this work are Graphics Processing Units (GPUs), originally designed to render video game graphics but repurposed for AI because they excel at doing many simple calculations simultaneously. If a traditional CPU is a brilliant surgeon performing delicate operations one at a time, a GPU is a factory floor with thousands of workers each doing one small task very quickly.
The Current State of Play: Welcome to the GPU Arms Race
In 2024, we crossed into new territory. The compute clusters training frontier AI models became genuinely staggering in scale:
xAI’s Colossus: 100,000 GPUs in 122 Days
Elon Musk’s AI company, xAI, built what was briefly the world’s largest AI supercomputer in a converted Electrolux factory in Memphis, Tennessee. The timeline was absurd: from decision to operational in 122 days, when similar projects typically take years. By September 2024, it was running 100,000 Nvidia H100 GPUs.
Then they doubled it. In 92 days.
As of mid-2025, Colossus runs 150,000 H100s, 50,000 H200s, and 30,000 GB200s. The ultimate target? One million GPUs. The facility now consumes 250 megawatts of power—enough electricity for 250,000 homes.
This isn’t an R&D project. This is industrialized intelligence production.
Meta’s Infrastructure: 350,000+ H100s
Meta (Facebook’s parent company) announced plans to accumulate 350,000 H100 GPUs by end of 2024, representing an investment exceeding $10 billion in GPUs alone. Including networking and infrastructure, the total exceeds $50 billion. Their Llama 3 model trained on a cluster of 16,384 H100s for 54 days—during which they experienced 148 interruptions from faulty GPUs and 72 from memory failures. This is rocket science that involves replacing engines mid-flight.
Oracle’s Zettascale Vision
Oracle is now taking orders for what they call the first “zettascale” AI supercomputer: up to 131,072 GPUs in a single cluster. For context, that’s more than three times the computing capacity of Frontier, the world’s fastest traditional supercomputer.
The Stargate Project: $500 Billion Over Four Years
Announced in January 2025 with a White House press conference, the Stargate Project represents perhaps the most ambitious AI infrastructure buildout ever conceived. SoftBank, OpenAI, Oracle, and MGX are investing up to $500 billion over four years, with $100 billion deployed immediately.
The flagship site in Abilene, Texas is already operational. By late 2025, five additional data center sites were announced. The project will eventually deploy over 450,000 Nvidia GB200 GPUs across more than 8 gigawatts of planned capacity. That’s the electrical output of about eight nuclear reactors.
What’s Inside These Clusters: The Hardware
The Current King: Nvidia H100
The Nvidia H100 (codenamed “Hopper”) has been the chip that defined the 2023-2024 AI boom. Key specifications that matter:
- 80GB of HBM3 memory (the fastest memory type available)
- 3.35 terabytes per second memory bandwidth (how fast data moves)
- 4 petaflops of AI compute (a petaflop is a quadrillion calculations per second)
- ~$25,000-30,000 per chip (when you can get them)
The H100 isn’t just expensive—it’s been genuinely scarce. In 2023-2024, AI companies hoarded them like gold bars. Jensen Huang, Nvidia’s CEO, became the most courted person in technology.
The New Contender: Blackwell (B200 and GB200)
In 2025, Nvidia’s Blackwell architecture began shipping at scale. The B200 GPU represents a generational leap:
- 192GB of HBM3e memory (2.4x the H100)
- 8 terabytes per second bandwidth (2.4x the H100)
- 20 petaflops of AI compute (5x the H100)
- ~$30,000-40,000 per chip
The GB200 NVL72 system connects 72 Blackwell GPUs to act as a single massive processor with 1.4 exaflops of AI performance. That’s 1.4 quintillion calculations per second. For perspective, the human brain is estimated to perform about one exaflop—this rack of GPUs matches 1.4 human brains in raw computational throughput.
The Networking: NVLink and InfiniBand
Here’s where things get interesting. Raw GPU power means nothing if the chips can’t talk to each other fast enough.
NVLink is Nvidia’s proprietary interconnect for GPU-to-GPU communication within a single server or rack:
- 1.8 terabytes per second of bidirectional bandwidth (the latest version)
- 7x faster than PCIe Gen 5 (the standard computer connection)
- Enables GPUs to share memory directly, as if they were one chip
InfiniBand connects servers across the data center:
- 400 Gb/s per port (with 800 Gb/s coming)
- Under 100 nanoseconds latency (a nanosecond is a billionth of a second)
- RDMA capability (Remote Direct Memory Access—GPUs can read each other’s memory without involving the CPU)
The typical architecture: NVLink connects GPUs within a node, InfiniBand connects nodes across the cluster. Together, they make thousands of GPUs behave like one giant processor.
The Power Problem: These Things Are Hungry
Here’s where the compute cluster buildout collides with physical reality: these facilities consume obscene amounts of electricity.
According to the International Energy Agency:
- Global data center electricity consumption hit 415 TWh in 2024 (~1.5% of global electricity)
- U.S. data centers alone consumed 183 TWh—about 4% of American electricity
- By 2030, global data center consumption is projected to reach 945 TWh (~3% of global electricity)
AI specifically consumes 10-20% of current data center energy, but that fraction is rising rapidly—potentially to 35-50% by 2030.
A single AI training cluster with 100,000 GPUs running at 70% efficiency can consume as much electricity daily as 150,000 homes do in an entire year. The xAI Colossus facility is rated for 250 MW—roughly equivalent to a small city’s power consumption.
This isn’t just an engineering problem. It’s a civilizational constraint.
The Fusion Connection
This is why the tech industry’s obsession with fusion energy isn’t just PR. When you’re planning to run data centers that individually consume gigawatts of power, you need energy sources that scale.
Microsoft has signed a deal with Helion to purchase power from a fusion reactor by 2028. Google partnered with Commonwealth Fusion Systems targeting the early 2030s. Fusion industry funding has jumped from $1.7 billion in 2020 to $15 billion as of September 2025.
The timeline is aggressive but not arbitrary. Current grid infrastructure simply cannot support the projected AI data center buildout. Something has to give. Either AI development slows, or we find new energy sources.
This is the fuel component of the Unscarcity tripod (The Brain, The Body, The Fuel). Fusion and AI aren’t just adjacent technologies—they’re symbiotic necessities.
The Hyperscaler Investment: Hundreds of Billions
The capital flowing into AI infrastructure defies historical precedent:
| Company | 2025 CapEx Guidance | Focus |
|---|---|---|
| AWS | ~$100 billion | Cloud AI infrastructure |
| Microsoft | ~$80 billion | Azure, OpenAI partnership |
| $75-85 billion | Cloud, Gemini training | |
| Meta | $70-72 billion | Llama training, social AI |
| Oracle | Stargate partner | Major infrastructure expansion |
Combined hyperscaler CapEx is projected to exceed $600 billion in 2026, with roughly 75% ($450 billion) directly tied to AI infrastructure. By 2030, a McKinsey report projects $5.2 trillion in cumulative AI data center capital expenditure.
These numbers are difficult to contextualize. The entire Apollo program cost about $280 billion in today’s dollars. The AI infrastructure buildout will exceed that in a single year.
The Geopolitics: Compute Is the New Oil
Sam Altman has called compute “the currency of the future”—“possibly the most precious commodity in the world.” Jonathan Ross of Groq echoes the sentiment: “Compute is the new oil.”
This isn’t hyperbole. The concentration is staggering:
- The United States controls ~75% of the world’s AI supercomputing capacity
- China holds ~15% (and falling, due to export restrictions)
- Nvidia commands 80-95% of the AI chip market
- TSMC manufactures ~90% of the world’s advanced chips
The U.S.-China semiconductor conflict is, in effect, a war over the means of intelligence production. The CHIPS and Science Act allocated over $52 billion to incentivize domestic chip manufacturing. TSMC is building two 2nm fabs in Arizona—a $65 billion investment. The explicit goal is to reduce dependence on Taiwan, which sits 100 miles from mainland China.
Export controls have made China a marginal producer of AI chips. DeepSeek founder Liang Wenfeng stated bluntly: “Money has never been the problem for us; bans on shipments of advanced chips are the problem.”
Yet China is responding. SMIC is reportedly readying production lines for 5nm chips. Beijing’s “Big Fund” pours billions into domestic semiconductor development. A technological decoupling is underway—potentially creating two incompatible AI ecosystems, American and Chinese, competing for global influence.
The Taiwan Question
Here’s the uncomfortable truth: if China blockades or invades Taiwan, the global tech economy collapses overnight. Modern militaries would face an immediate semiconductor drought. Every iPhone, every AI model, every advanced weapon system depends on chips that mostly come from one small island.
This is why TSMC is expanding globally—to Japan (with Japanese government subsidies), to Arizona (with American subsidies), to Germany. Geographic diversification is a national security imperative, not just a business strategy.
Why Individual Chips Don’t Matter (And Why Clusters Do)
A single H100 is impressive but useless for frontier AI training. The models are simply too large. GPT-4 is rumored to have around 1.7 trillion parameters. Llama 3’s largest version has 405 billion. Each parameter needs to be stored, updated, and communicated.
The math works like this:
- A 1 trillion parameter model needs ~2 terabytes just to store the weights in half-precision floating point
- During training, you also need to store gradients and optimizer states—roughly 16 bytes per parameter
- That’s 16 terabytes of memory for a 1 trillion parameter model
- A single H100 has 80GB of memory
You need at minimum 200 H100s just to hold the model, plus additional GPUs for batch processing. In practice, training requires thousands to tens of thousands of GPUs running for weeks or months.
This is why clusters matter. The ability to coordinate tens of thousands of processors on a single training run—with minimal communication latency, maximal bandwidth, and robust fault tolerance—is the core technological achievement. The individual chip is impressive; the orchestra is transformative.
The Coming Democratization (Maybe)
Here’s where the Unscarcity thesis becomes concrete.
Right now, frontier AI training is confined to about a dozen organizations worldwide: OpenAI, Anthropic, Google DeepMind, Meta, xAI, Mistral, a handful of Chinese labs, and some well-funded startups. The capital requirements—billions of dollars in GPUs, power, and engineering talent—create natural barriers.
But three forces could democratize access:
1. Inference Is Cheaper Than Training
Training GPT-4 required months on a massive cluster. Running inference (getting answers from the trained model) is far cheaper. Cloud providers now offer API access to frontier models for pennies per query. The capability is increasingly accessible even as the means of production remains concentrated.
2. Smaller Models Are Getting Better
Distillation techniques, efficient architectures, and better training data mean smaller models can approach the performance of larger ones. Llama 3’s 8 billion parameter model outperforms GPT-3.5 (175 billion parameters) on many benchmarks. The floor is rising.
3. Decentralized Compute Networks
Emerging networks like Aethir aggregate idle GPU capacity globally. The idea: if a million people each contribute a small amount of compute, you get a distributed supercomputer. The technology is nascent, but the concept is sound.
Connection to the Unscarcity Vision: Universal Basic Compute
This brings us to Universal Basic Compute (UBC)—the idea that every citizen should receive a guaranteed allocation of AI processing capacity.
If compute is becoming the means of production—the modern equivalent of land in an agrarian economy—then distributing compute is distributing economic agency. A citizen with a compute allocation can:
- Run personal AI agents that manage life administration
- Participate in pooled cooperatives for larger projects
- Delegate their allocation to Mission Guilds in exchange for services
- Create, research, or build using the same tools as major corporations
The Foundation layer in the Unscarcity framework could eventually provision compute the way it provisions housing or food—as infrastructure for dignified existence. Not a luxury, but a baseline.
This isn’t fantasy. Research institutions already allocate compute quotas. National cloud initiatives in Saudi Arabia, UAE, and Singapore are building sovereign AI infrastructure. The concept of “compute as public utility” is emerging in policy discussions.
But achieving UBC requires:
- Sufficient total compute—currently a constraint, but Blackwell and future generations will multiply capacity
- Distribution infrastructure—cloud platforms that can allocate and meter compute fairly
- Accessible interfaces—so “using compute” becomes as intuitive as “using electricity”
- Governance frameworks—to prevent concentration, corruption, and hoarding
The Diversity Guard mechanism addresses governance. The Civic Service program includes AI orchestration training. The EXIT Protocol negotiates with existing power holders for gradual transition.
What to Watch For
2025-2026
- Blackwell GPUs reach full production scale
- Stargate Project operationalizes multiple data centers
- First megawatt-class liquid cooling systems become standard
- Continued export restriction tightening on China
2027-2030
- First commercial fusion reactors potentially come online (Helion/Microsoft deal)
- AI-specific chip architectures mature beyond Nvidia’s dominance
- Decentralized compute networks prove (or fail) viability
- UBC pilot programs in selected jurisdictions
2030+
- Potential crossover point where inference compute exceeds training compute
- Foundation-layer compute infrastructure in early deployments
- Fusion energy meaningfully powering data centers
- The contours of “who controls intelligence production” become clear
The Stakes
Here’s what’s actually being decided in the GPU warehouses of Memphis, Abilene, and undisclosed locations worldwide:
Who will control the means of intelligence production?
If the answer is “a handful of corporations and nation-states,” we get the Star Wars trajectory—technological feudalism with better graphics. The abundance AI creates concentrates at the top. Everyone else becomes economically irrelevant but biologically alive.
If the answer is “distributed infrastructure with universal access,” we get something closer to the Unscarcity vision—where compute is a public utility, AI capability is broadly held, and the Foundation layer provides dignified existence while the Ascent layer rewards contribution.
The race to build compute clusters isn’t just a commercial competition. It’s the construction of the infrastructure that will determine how intelligence is distributed across human civilization.
The factories of the Intelligence Age are being built right now. The question is whether they’ll produce liberation or lock-in.
References
- xAI Colossus Official Page
- Building Meta’s GenAI Infrastructure - Engineering at Meta
- Announcing The Stargate Project - OpenAI
- Oracle’s Zettascale AI Supercomputer
- Nvidia Blackwell Architecture
- GB200 NVL72 - Nvidia
- What Is NVLink? - Nvidia Blog
- Energy Demand from AI - IEA
- AI Datacenters Need Nuclear Fusion - CACM
- Why the AI Industry Is Betting on Fusion Energy - TIME
- Hyperscaler CapEx Projections - IEEE ComSoc
- Compute is the New Oil - Analytics India Magazine
- TSMC and Geopolitical Tensions - The New Global Order
- US Export Controls on China AI - AI Frontiers
- Universal Basic Compute - CoinDesk
- The Cost of AI: Hyperscaler Spending - Fusion Worldwide