π§ Listen to this article
Narrated by Talon Β· The Noble House
In January 2026, OpenAI signed a $10 billion contract with Cerebras Systems for up to 750 megawatts of computing power over three years through 2028. On February 12, they shipped the first product built on that infrastructure: GPT-5.3-Codex-Spark, running on Cerebras' Wafer Scale Engine 3. No NVIDIA GPUs involved in the inference layer. For the first time in OpenAI's history, a production model ran without touching NVIDIA silicon.
The reporting was accurate: this is an important chip-supplier diversification move. What the coverage missed is why NVIDIA's monopoly existed in the first place β and why that explanation matters for understanding how durable the crack in its wall actually is.
ποΈ Listen: Audio version
Why NVIDIA's Monopoly Was Always About Software
Jensen Huang has said repeatedly that NVIDIA is a software company that happens to make chips. This isn't marketing β it's an accurate description of the competitive moat. CUDA, NVIDIA's proprietary parallel computing platform, is what actually sustains the lock-in. Every major AI framework β PyTorch, TensorFlow, JAX β was built to run on CUDA. Every AI engineer trained on CUDA-compatible hardware. The 18-year head start in software ecosystem development is the moat, not the hardware itself.
The pattern is recognizable: Intel's x86 monopoly was sustained by software compatibility, not hardware superiority. IBM's mainframe dominance was sustained by ecosystem lock-in. Microsoft's Windows grip was sustained by application compatibility. The hardware is the vehicle; the software ecosystem is the moat. Any competitor who wants to displace NVIDIA has to replicate 18 years of software ecosystem development or find a workload where CUDA compatibility doesn't matter.
Cerebras found the second path.

Inference Is the Crack in the Wall
Training and inference are fundamentally different workloads. Training a frontier model requires thousands of GPUs running in parallel for weeks or months, with massive communication overhead between chips. CUDA and NVIDIA's NVLink interconnect technology are optimized for exactly this: large-scale distributed training where all-to-all communication between chips is continuous.
Inference is different. Inference is serving a trained model to users β every API call, every completion, every suggestion. It's a variable cost workload that scales with demand and is optimized for latency and throughput per request. Training optimization and inference optimization have different engineering requirements.
Cerebras' Wafer Scale Engine is extremely fast at inference-style workloads: single-chip, low-latency, high-throughput token generation. The trade-off is that it's not designed for the distributed training workloads where NVIDIA's multi-chip interconnect is essential. OpenAI isn't training GPT-5.3 on Cerebras hardware β they're serving it. That distinction is what makes the deal viable: Cerebras competes on the workload where CUDA lock-in matters least.
What Diversification Actually Means
The Tom's Hardware and The Register coverage both noted that OpenAI continues to add AMD GPUs and other accelerators alongside the Cerebras deal. This is supplier diversification, not NVIDIA replacement. OpenAI is managing dependency risk by ensuring that no single chip supplier is on the critical path for all of their compute needs.
For NVIDIA, the threat from Cerebras is real but bounded: inference represents roughly 30β40% of AI compute spend (training dominates). If Cerebras, AMD, Intel, and emerging players can capture inference market share across the industry, NVIDIA's total addressable market compresses β but their position on training, which requires their interconnect technology and software ecosystem, remains structurally difficult to displace in the near term.

The Longer-Term Signal
The significant fact isn't GPT-5.3-Codex-Spark specifically β it's that a production model at OpenAI shipped on non-NVIDIA inference infrastructure and worked. This is proof of concept at production scale. The next deal is easier to justify internally; the pattern normalizes.
For the broader AI ecosystem, the message is about negotiating leverage. As long as NVIDIA was the only viable option for production AI infrastructure, pricing power and terms were theirs to set. Every successful production deployment on alternative hardware improves every AI company's position in NVIDIA pricing conversations. That dynamic has begun, regardless of which chips ultimately win the inference market.
Sources: The Register, "OpenAI unveils first model running on Cerebras silicon," February 12, 2026; Bloomberg, "OpenAI Debuts First Model Using Chips From Nvidia Rival Cerebras," February 12, 2026; Tom's Hardware, "OpenAI launches GPT-5.3-Codex-Spark on Cerebras chips," February 2026; Techzine, "OpenAI swaps Nvidia for Cerebras with GPT-5.3-Codex-Spark," February 2026; ExtremeTech, GPT-5.3 Codex Spark coverage, February 2026; Jensen Huang, "NVIDIA is a software company" (multiple conference appearances)
Sources
- Reuters β OpenAIβCerebras $10B contract, 750MW compute through 2028 (January 2026)
- OpenAI β GPT-5.3-Codex-Spark on Cerebras WSE-3 (February 12, 2026)
- Cerebras Systems β Wafer Scale Engine 3 specifications
- NVIDIA β CUDA platform: 18-year software ecosystem head start
- Futurum Research β Big Tech AI capex $660β690B for 2026 (February 2026)