Infrastructure intelligence

The hardware story behind frontier AI

Compute clusters, data centers, GPU supply, power deals, and capex — the physical layer that decides which labs can train tomorrow's frontier and at what unit cost. Editorial briefs synthesize the data; signals below are the atomic facts they're built on.

Editorial briefs

2 published

Brief · May 20, 2026

The CoWoS bottleneck: why GPU supply still gates 2026 model roadmaps

Every Blackwell, Ironwood, and Trainium2 chip flows through one TSMC packaging line. Capacity doubled in 2026 — and every frontier lab is still constrained.

supply-chaintsmcpackagingnvidia

Brief · May 15, 2026

Anthropic's two-cloud bet: why Trainium plus TPU changes the math

Custom silicon on two hyperscalers gives Anthropic the lowest unit cost in the frontier — and the biggest single-vendor risk.

anthropicsilicon-economicsawsgoogle

Signals

14 events

All Compute cluster Data center Silicon GPU supply Power Capex Partnership Training run

xAI DeepSeek OpenAI Anthropic Mistral Google Meta Alibaba

Compute clusterreportedApr 22, 2026
DeepSeek operates ~50k H800 GPU fleet despite export controls
DeepSeek's training fleet — believed to be ~50,000 H800 GPUs at peak — was assembled prior to the October 2023 US export controls extension. DeepSeek-V4-Pro's training run reportedly used <8M GPU-hours total, less than 10% of GPT-5's estimated budget, leveraging the hybrid CSA+HCA attention scheme to compress FLOPs.
Accelerators
50k · H800
Location
Hangzhou, China
DeepSeek DeepSeek-V4-Pro SemiAnalysis estimate ↗
GPU supplyverifiedApr 15, 2026
NVIDIA GB300 ships at volume Q2 2026, replacing GB200 in hyperscaler deals
GB300 NVL72 began volume shipments in April 2026, six months ahead of original schedule. Each rack delivers 1.4x the FP8 throughput of GB200 NVL72 with the same power envelope. Microsoft, Oracle, and xAI are the largest Q2 2026 takers. CoreWeave disclosed first GB300 deployment May 7.
Accelerators
GB300
OpenAI xAI NVIDIA GTC + supply chain reports ↗
SiliconverifiedApr 9, 2026
Google TPU v7 "Ironwood": 9,216-chip pods, 42.5 exaflops, dedicated to inference
Announced at Google Cloud Next April 2026, TPU v7 (Ironwood) is the first Google TPU generation purpose-built for inference rather than training. Each pod scales to 9,216 chips delivering 42.5 exaflops of FP8 compute, with 192GB HBM3e per chip. Powers Gemini 3 Pro / 3.5 inference at Google scale. Annual TPU spend ramped to over $40B for FY26.
Accelerators
TPU v7 Ironwood
Google Gemini 3.5 Google Cloud Next 2026 keynote ↗
Compute clusterreportedMar 10, 2026
xAI Colossus 2: targeting 1M GPUs across Memphis + new Mississippi site
Colossus 2 — xAI's expansion target — aims for 550k Blackwell-class GPUs (GB200/GB300) in 2026, scaling toward 1M total accelerators by year-end. A second 2GW campus in DeSoto County, MS, is under construction to host the bulk of the buildout. Power deals announced with the Tennessee Valley Authority and Mississippi Power.
Accelerators
1M · GB200/GB300
Power
2 GW
Location
Memphis, TN + DeSoto County, MS
xAI Grok 4 Bloomberg / company filings ↗
PowerreportedFeb 4, 2026
Google Stone Mountain GA campus expanding to 2GW with on-site solar+gas hybrid
Google's Stone Mountain, Georgia data center cluster — built around Gemini training — disclosed a 2GW expansion plan in early 2026 backed by a 1.6GW solar+gas hybrid generation deal with Southern Company. Site will host the second-largest TPU v7 deployment after Council Bluffs, IA.
Power
2 GW
Location
Stone Mountain, GA
Google Southern Company Q4 2025 earnings ↗
Data centerreportedJan 30, 2026
Microsoft Fairwater Wisconsin: dedicated 1.5GW OpenAI training campus
Microsoft's Mount Pleasant, WI "Fairwater" campus — disclosed as a dedicated OpenAI training facility — reached 1.5GW configuration by Q1 2026 with NVIDIA GB200 NVL72 racks. The site uses closed-loop water cooling to reduce evaporative loss. Total Microsoft AI capex run-rate for FY26 is ~$110B.
Accelerators
GB200
Power
1.50 GW
Location
Mount Pleasant, WI
OpenAI Microsoft FY26 Q2 disclosure ↗
GPU supplyreportedDec 15, 2025
TSMC doubles CoWoS-L capacity to ~80k wafers/month for 2026 AI demand
TSMC's CoWoS-L advanced-packaging capacity — the binding constraint on Blackwell + Ironwood + Trainium2 supply — was doubled to ~80,000 wafers/month for 2026. Even after the expansion, every major customer (NVIDIA, Google, AWS, Marvell) remains capacity-constrained. CoWoS-L allocation has become the single most important variable in vendor compute roadmaps.
DIGITIMES / TSMC investor day ↗
Compute clusterreportedDec 3, 2025
Anthropic's Project Rainier: 400k Trainium2 chips across AWS multi-region
Anthropic's primary training cluster — codenamed Project Rainier — runs on a multi-region Trainium2 fleet AWS built specifically for them. By end of 2025 the configuration was disclosed at roughly 400,000 Trainium2 chips spanning sites in Indiana, Wyoming, and Mississippi. Trainium2's economics are central to Anthropic's ability to sell Sonnet at ~40% the price of equivalent-tier rivals.
Accelerators
400k · Trainium2
Location
St Joseph County, IN + Wyoming + Mississippi
Anthropic Claude 4.7 Opus AWS re:Invent / Anthropic announcement ↗
PartnershipverifiedNov 13, 2025
Anthropic adds up to 1M TPU v7 Ironwood chips via Google Cloud
Anthropic's November 2025 deal with Google Cloud expands its TPU footprint to up to 1M Ironwood (TPU v7) chips, supplementing the Trainium2 fleet on AWS. This makes Anthropic the rare frontier lab running heterogeneous custom silicon across two clouds. Deal value reportedly tens of billions over multiple years.
Accelerators
1M · TPU v7 Ironwood
Anthropic Claude 4.7 Opus Google Anthropic + Google press release ↗
CapexverifiedFeb 20, 2025
Alibaba commits RMB 380B (~$53B) to AI + cloud infrastructure over 3 years
Alibaba's February 2025 commitment of RMB 380B over three years — its largest infrastructure spend ever — funds new ACS (Alibaba Cloud Smart) data centers in Inner Mongolia, Heilongjiang, and Guizhou. Powers training for the Qwen3 / Qwen3.x family including Qwen3.7-Max. Heavy use of Ascend 910C plus a residual H800 fleet.
Capex
$53B
Alibaba Qwen3.7-Max Alibaba earnings call ↗
Compute clusterverifiedFeb 17, 2025
xAI Colossus: 200k H100 GPUs in 122 days at the Memphis ex-Electrolux site
xAI brought up its first Memphis training cluster, "Colossus", in 122 days — a build cadence unprecedented in the industry. The initial buildout was 100k H100s; xAI doubled it to 200k by early 2025 by adding H200s. Power was bridged with on-site mobile gas turbines while Tennessee Valley Authority capacity caught up. Colossus trained Grok 3 and is currently training Grok 4 successors.
Accelerators
200k · H100/H200
Power
150 MW
Location
Memphis, TN
xAI Grok 4 xAI / Nvidia announcement ↗
Data centerverifiedJan 21, 2025
Stargate Abilene: OpenAI + Oracle + SoftBank $500B compute pact, first 1.2GW campus
Stargate is a 4-year, $500B compute commitment announced January 2025, jointly funded by OpenAI, Oracle, SoftBank, and MGX. The flagship 1.2GW Abilene, TX campus came online in phases through 2025–2026, anchored by Oracle Cloud. Additional sites in Wisconsin and New Mexico are under construction. Powers OpenAI's training fleet for GPT-5.5 and beyond.
Power
1.20 GW
Capex
$500B
Location
Abilene, TX
OpenAI GPT-5.5 OpenAI / White House announcement ↗
Data centerverifiedDec 4, 2024
Meta Hyperion: 2GW Louisiana training campus, online phased 2026–2030
Meta's Richland Parish, LA campus — codenamed Hyperion — is the largest single AI training site under construction in North America. Targets 2GW IT load with eventual room to scale toward 5GW. Anchored by Entergy's new 1.5GW gas turbine build. Will train Llama 5+ models and host Meta's GenAI inference fleet. Total committed capex: ~$10B for the site.
Power
2 GW
Capex
$10B
Location
Richland Parish, LA
Meta Meta press release / Entergy filing ↗
CapexverifiedNov 22, 2024
Amazon's $8B Anthropic investment locks in Trainium as primary training chip
Amazon committed an additional $4B to Anthropic in November 2024 on top of an earlier $4B — total $8B — with the condition that Anthropic adopt AWS Trainium2 as the primary training chip for future Claude models. The deal underwrites Project Rainier and makes Anthropic the anchor tenant for AWS's custom silicon roadmap.
Capex
$8B
Anthropic Anthropic Amazon press release ↗

Editorial briefs

The CoWoS bottleneck: why GPU supply still gates 2026 model roadmaps

Anthropic's two-cloud bet: why Trainium plus TPU changes the math

Signals

DeepSeek operates ~50k H800 GPU fleet despite export controls

NVIDIA GB300 ships at volume Q2 2026, replacing GB200 in hyperscaler deals

Google TPU v7 "Ironwood": 9,216-chip pods, 42.5 exaflops, dedicated to inference

xAI Colossus 2: targeting 1M GPUs across Memphis + new Mississippi site

Google Stone Mountain GA campus expanding to 2GW with on-site solar+gas hybrid

Microsoft Fairwater Wisconsin: dedicated 1.5GW OpenAI training campus

TSMC doubles CoWoS-L capacity to ~80k wafers/month for 2026 AI demand

Anthropic's Project Rainier: 400k Trainium2 chips across AWS multi-region

Anthropic adds up to 1M TPU v7 Ironwood chips via Google Cloud

Alibaba commits RMB 380B (~$53B) to AI + cloud infrastructure over 3 years

xAI Colossus: 200k H100 GPUs in 122 days at the Memphis ex-Electrolux site

Stargate Abilene: OpenAI + Oracle + SoftBank $500B compute pact, first 1.2GW campus

Meta Hyperion: 2GW Louisiana training campus, online phased 2026–2030

Amazon's $8B Anthropic investment locks in Trainium as primary training chip