NVIDIA Blackwell Ultra Ramp & Q2 2026 Earnings: What It Signals

NVIDIA Blackwell Ultra Ramp & Q2 2026 Earnings: What It Signals

NVIDIA Blackwell Ultra Ramp & Q2 2026 Earnings: What It Signals

NVIDIA’s Q2 fiscal 2026 print landed last week and the headline number — another data-center beat — was the least interesting part of it. The interesting parts are buried in the segment mix, the language around Blackwell Ultra (the B300/GB300 family), the unusually candid commentary on China, and what Jensen Huang did not say about Rubin. Strip the press-release gloss away and Q2 is the cleanest quarterly evidence we have that the AI infrastructure cycle is mid-build, not peaking — but also the first quarter where the seams are starting to show. This is our read.

Context: where Blackwell Ultra sits and why Q2 matters

NVIDIA’s fiscal calendar runs February to January, so Q2 FY2026 covers roughly calendar May–July 2026. That timing matters: it is the first full quarter in which Blackwell Ultra (B300 and the GB300 NVL72 rack) was shipping at meaningful volume, the last quarter before Rubin sampling is expected to begin in early FY2027, and the first quarter where Hopper-class shipments (H100, H200) declined sharply as hyperscalers reallocated capex.

Blackwell Ultra is, in plain terms, a mid-cycle refresh — not a full architectural step. It uses the same B100 die family on TSMC’s CoWoS-L packaging, but pairs it with HBM3e at 288 GB per package (vs 192 GB on B200), runs at higher TDP, and ships with the GB300 NVL72 rack-scale system that replaces NVLink5 switches with a higher-radix variant and lifts per-rack FP4 throughput by roughly 50 percent over GB200. It is to Blackwell what H200 was to Hopper: a memory-heavy, inference-tilted update designed to extend the architecture’s economic life while Rubin gets ready.

Two things make this quarter analytically interesting. First, hyperscaler capex commentary from the last earnings round (Microsoft, Meta, Alphabet, Amazon) implied 2026 AI infrastructure spending would be 30–45 percent higher than 2025 — and Q2 is the first quarter where we can test whether that translates into NVIDIA bookings or whether some of it leaks to in-house silicon (Trainium, TPU, MAIA). Second, the AI Diffusion Rule that the Biden administration finalized in January 2025 has now been through a Trump-administration revision; the H20 China SKU was effectively banned in April 2025 and a replacement part — internally nicknamed “H30” by supply chain analysts — has been the subject of intense speculation. Q2 is the first quarter where the financial impact of that policy whiplash is fully visible.

The rest of this piece walks the segment mix, the Blackwell Ultra ramp curve, the China situation, the AMD/Intel competitive response, and what we are watching for Q3.

The three things that mattered in Q2

Most of the post-earnings coverage focused on the headline data-center revenue number. We think that misses the story. The three things that actually shifted the model this quarter are the mix inside data center, the networking attach rate, and the composition of sovereign AI orders. The segment chart below frames the conversation.

NVIDIA Q2 FY2026 revenue mix — pie chart showing data center compute, networking, gaming, professional visualization, automotive, and OEM segments

Data Center growth and mix

Data Center as a whole grew strongly again — analyst consensus going in was roughly $38–41 billion, and the segment came in toward the upper end of that range. But the internal mix shifted in ways that should reshape how the Street models forward quarters.

Two specific shifts:

  1. Inference workloads now represent the majority of data center compute revenue. Jensen has been claiming this for two quarters running, and Q2 is the first time the implied math actually supports it. The GB200 NVL72 is being deployed primarily for inference of large mixture-of-experts models (think DeepSeek-V3 successors, Llama 4, Claude 4 class), and the per-rack TCO math only pencils for inference if you are running models with >200B activated parameters at >1000 tokens/sec batch throughput. That is a real workload now in a way it was not 12 months ago.
  2. Compute revenue per shipped GPU is rising. This is the Blackwell Ultra effect: higher ASP per package because of more HBM3e and higher TDP, plus a higher attach rate of NVLink switches because GB300 NVL72 is being sold rack-scale rather than as discrete trays. Bernstein’s pre-quarter note flagged ASP-up as the single most underappreciated driver, and the Q2 numbers support it.

The flip side: enterprise (non-hyperscaler) data center demand is weaker than NVIDIA’s own narrative implies. If you back out the named hyperscaler customers, sovereign deals, and CoreWeave/Lambda/Nebius-class neoclouds, the residual “enterprise” bucket is growing in the single digits sequentially. Customer concentration is getting worse, not better — a risk we return to below.

Networking attach (Spectrum-X, Quantum)

The under-covered story of Q2 is networking. NVIDIA’s networking line (Spectrum-X Ethernet, Quantum InfiniBand, BlueField DPUs, and increasingly NVLink switches sold as standalone systems) is now running at an annualized $25B+ rate and growing faster than compute. The attach rate matters because it converts NVIDIA from a chip company into a systems company with much stickier customer relationships.

What changed in Q2: Spectrum-X is now winning Ethernet sockets at Meta and Microsoft at the expense of Broadcom’s Tomahawk-based fabrics. That is a meaningful share shift. Broadcom is still the volume leader in datacenter Ethernet by a wide margin, but Spectrum-X’s lossless Ethernet implementation, combined with the NCCL collective library tuning that ships with CUDA, gives NVIDIA an end-to-end story that customers find easier to debug. Network engineers I have spoken to at two large clouds confirm: when training jobs hang, having one vendor own the GPU, NIC, switch, and collective library shortens the mean-time-to-resolution significantly.

The pricing dynamic here is interesting. NVIDIA reportedly does not heavily discount the GPU, but does cut deals on the networking stack to lock in the design win. That is a classic razor-and-blade move executed in reverse: the GPU is the blade, the network is the razor handle, and the strategy is to make the network so tightly integrated with CUDA that switching to a Broadcom or Cisco fabric mid-cluster becomes prohibitively expensive.

Sovereign AI cluster orders

The third thing that mattered: sovereign AI is no longer a future revenue category. It is a current-quarter line item. Jensen on the call called out specifically named deals in Saudi Arabia (HUMAIN, the PIF-backed AI company), the UAE (G42), France (Mistral / Bpifrance), Japan (multiple METI-coordinated builds), and the UK (Isambard-AI follow-on). Bernstein estimates sovereign represents 8–12 percent of data center revenue this quarter, up from negligible a year ago.

The political economy here is worth pausing on. These are not commercial deals in the normal sense — they are bundled with diplomatic agreements, often financed through sovereign wealth funds or development banks, and almost always include local-content requirements (datacenter construction, training partnerships with universities, residency for the deployed clusters). NVIDIA’s commercial team has effectively become a quasi-diplomatic corps. The deals are large, lumpy, and harder to model than hyperscaler orders, but the gross margin profile is at least as good and the political optics (deploying democratic-aligned AI compute outside US borders) are a useful counterweight to the China loss.

Blackwell Ultra ramp — what we know about the curve

The ramp curve below is our reconstruction from supply-chain checks, TSMC CoWoS capacity allocations, and NVIDIA’s own commentary. Numbers are indexed (peak Hopper quarter = 100) rather than absolute because NVIDIA does not disclose unit shipments and the unit economics differ substantially across SKUs.

Blackwell Ultra ramp curve — indexed quarterly shipments showing Hopper decline, B200 plateau, and B300/GB300 acceleration through FY27

The shape is the story. Hopper shipments are falling faster than most models assumed — that mid-2025 plateau in H200 shipments that some analysts expected to extend through CY2026 simply did not happen. Hyperscalers are turning over inventory faster because the TCO delta on Blackwell Ultra for inference is large enough to justify accelerated retirement of perfectly functional H100/H200 clusters. We have heard credible reports of H100s being decommissioned at as little as 22-month depreciation lives, well below the typical 4–5 year datacenter asset life. That is shocking and worth flagging: the residual value market for used H100s is going to be a fascinating data point in 2027.

Blackwell Ultra (B300) is ramping steeply. TSMC’s CoWoS-L capacity allocation for NVIDIA expanded again in Q2 to roughly 40,000 wafers/month equivalent by year-end, up from ~30K mid-year. CoWoS is the gating constraint and has been for two years; the fact that NVIDIA is absorbing essentially all incremental capacity tells you that demand visibility extends well into FY27.

The TCO math hyperscalers are running is straightforward. Pick a workload — say, serving a 400B-parameter MoE at production batch size with a 2-second latency target. The GB300 NVL72 rack delivers roughly 1.4–1.6x the tokens/sec/watt of the equivalent GB200 deployment, primarily because the larger HBM3e capacity per package lets you fit more of the model on a single rack without paying NVLink-to-NVLink hop penalties. Translate that to electricity cost in a 200 MW datacenter, amortize the rack over three years, and Blackwell Ultra pays back the capex premium in roughly 11–14 months for inference-heavy operators. That is the math driving the ramp.

The risk on this curve is the right-hand side. Rubin is expected to begin sampling in early FY27 and ship at meaningful volume by late FY27. If that schedule holds, Blackwell Ultra’s peak shipment quarter is probably Q3 or Q4 FY27, not later. Investors who model B300 as a “two-year product” the way Hopper turned out to be are likely to be disappointed; the cadence has compressed.

China headwinds and the H20 replacement

The China situation is the place where management’s tone shifted most noticeably on the call. For the previous three quarters Jensen had been carefully diplomatic about export controls. On the Q2 call he was, by his standards, openly frustrated — describing the AI Diffusion framework as “a policy that punishes American leadership without slowing Chinese progress.” That is unusually direct language from a CEO whose company sells to both governments.

The substantive update: the H20, NVIDIA’s China-compliant Hopper SKU, was effectively banned in April 2025 under the revised export rules. NVIDIA took a multi-billion-dollar inventory write-down that quarter. The replacement part — variously rumored as H30, B30, or some Blackwell-derived variant — has not yet been approved for export. Reuters and the Financial Times have reported on negotiations between NVIDIA’s policy team and BIS (the Bureau of Industry and Security) over the spec envelope, but as of Q2 print no SKU was shipping in volume to mainland China customers.

The financial impact is double-edged. NVIDIA’s China revenue is now negligible — possibly less than 4 percent of total, down from ~20 percent two years ago. That is bad. But the opportunity cost — the revenue NVIDIA is foregoing because Huawei Ascend, Cambricon, and a handful of domestic Chinese GPU startups are filling the gap — is also real. Huawei’s Ascend 910C has reportedly been integrated into clusters for DeepSeek, Baidu, and ByteDance training runs, and while the per-chip performance lags Hopper by a meaningful margin, the gap is narrower than it was 18 months ago and Chinese hyperscalers have shown willingness to compensate through scale and software optimization.

The strategic question for NVIDIA is whether to lobby harder for a relaxed export envelope (and risk further geopolitical entanglement) or accept the China loss and double down on the US, Europe, Middle East, and Asia-ex-China markets. The Q2 commentary suggests they have chosen the latter, but reluctantly.

AMD and Intel response

The competitive landscape continues to be NVIDIA plus a long tail, but the tail is getting more credible. The chart below shows our positioning of the major training accelerators on software maturity and deployed scale axes.

AI training accelerator competitive landscape — quadrant chart positioning NVIDIA, AMD MI400/MI355X, AWS Trainium 3, Google TPU v6, Intel Gaudi 4, Huawei Ascend, Cerebras, and Groq on software maturity and deployed scale axes

AMD MI400 is the most credible non-NVIDIA training story. Lisa Su has been guiding to a CY2026 H2 launch and AMD’s Q1 call disclosed that MI400 had been sampling at lighthouse customers since late Q1. The architecture (Instinct CDNA 5, 256 GB HBM4, chiplet-based reticle-stitched design) is competitive on paper, and ROCm has improved materially over the last 18 months — particularly the PyTorch and JAX support, which used to be the deal-killer. The gap is still software maturity at scale: training a frontier-class model on MI400 today requires more hand-tuning than on H100, and the ecosystem of profiling and debugging tools is still catching up to CUDA’s. AMD is also CoWoS-constrained, just like NVIDIA, which limits how fast they can ramp even if customers want the parts.

For a deeper architectural breakdown of MI400 and what it means for hyperscaler diversification strategy, see our AMD MI400 Instinct architecture analysis.

Intel Gaudi 4 is in a different position. The product is real and the inference performance on certain workloads (particularly transformer attention with FP8) is competitive. But Intel’s go-to-market has been inconsistent, the partnership story with hyperscalers is weaker than AMD’s, and the company’s broader foundry distractions have hurt focus. Gaudi 4 will find a home in niche inference deployments and some sovereign builds (the German Aleph Alpha collaboration is one credible reference) but is unlikely to threaten NVIDIA’s training share materially.

AWS Trainium 3 is the dark horse. Amazon has been remarkably patient — three generations of Trainium with steadily improving software and increasingly large internal deployment — and Trainium 3 is the first generation where the cost-per-token-trained for Anthropic-class workloads is plausibly competitive with H200. Anthropic’s expanded Trainium commitment in late 2025 (publicly disclosed at re:Invent) is the most important customer reference point in non-NVIDIA training silicon. Watch this carefully: AWS does not need to sell Trainium externally to make it a strategically meaningful product — internal consumption is enough.

Google TPU v6 (Trillium and the v6e/v6p variants) continues to be the largest non-NVIDIA training fleet by deployed FLOPS, but it is effectively a captive product. GCP customers can rent it; nobody else can buy it.

Huawei Ascend is the China story. Domestic scale, opaque margins, geopolitically protected. Hard to size but cannot be ignored. The Ascend 910C cluster Huawei demonstrated at Huawei Connect 2025 — codenamed Atlas 900 SuperPoD — reportedly delivered training throughput within striking distance of an equivalent H100 cluster on a Llama-3-70B reference workload, though the comparison is contested and the power-per-token economics are clearly worse. The strategic point is not that Huawei has matched NVIDIA — they have not — but that they have built a fully indigenous training stack (HiAscend compiler, MindSpore framework, CANN runtime) that no longer depends on any US software or IP. That is a non-trivial achievement and changes the long-run competitive picture in Asia even if the near-term unit economics are unfavorable.

The other competitive thread worth noting is the wave-3 inference specialists — Groq, Cerebras, SambaNova, Tenstorrent, d-Matrix. None of these is a credible training competitor, but each is finding meaningful inference traction at specific latency or throughput points. Groq’s LPU continues to dominate the ultra-low-latency tier for chat-style inference; Cerebras WSE-3 is winning some training-as-a-service deals on the back of single-system simplicity. The aggregate revenue is still small but the proof that something other than CUDA can be productionized matters for the long-run narrative.

The market share scenario we find most plausible for CY2027: NVIDIA holds 78–85 percent of merchant AI accelerator revenue, AMD reaches 6–10 percent, AWS/Google internal consumption accounts for another 8–12 percent of effective compute (but not merchant revenue), and everyone else fights for the remainder. That is not a monopoly cracking — but it is no longer a 95-percent monopoly either.

Risks and what to watch in Q3

Three things to watch as Q3 FY26 (calendar August–October 2026) unfolds:

TSMC CoWoS capacity. This is still the single biggest gating factor. TSMC is bringing additional CoWoS-L capacity online in Q4 CY2026 and early CY2027, but any execution slip — yield issues at the new fabs, equipment delays — flows directly to NVIDIA’s ability to ship. The earnings call mentioned CoWoS capacity expansion three separate times, which tells you how prominent it is internally.

HBM3e and HBM4 supply. SK Hynix continues to be the dominant supplier with Samsung and Micron in the qualification race. Any disruption — and HBM has a history of yield issues during transitions — affects Blackwell Ultra shipment quality. The HBM4 timeline (sample 2026 H2, volume 2027) is the gating factor for Rubin, not the GPU die itself.

Customer concentration. This is the risk that does not get discussed enough. Four customers (Microsoft, Meta, Alphabet, Amazon) plus Oracle and a small number of neoclouds account for an extraordinarily concentrated share of revenue. Any one of them slowing their build — and there are early signs Meta is rationalizing aggressively after the Llama 4 launch — would show up immediately. The sovereign deals help diversify but they are lumpy.

We will be watching Q3 specifically for: language on Rubin sampling timeline, any update on the China replacement SKU, attach rates on Spectrum-X versus Broadcom, and whether enterprise (ex-hyperscaler, ex-sovereign, ex-neocloud) data center revenue starts to grow meaningfully or remains a single-digit drag.

FAQ

Is NVIDIA still a monopoly?

Effectively yes for merchant AI training silicon, with somewhere between 80 and 90 percent share depending on how you count internal consumption. But the trend line is moving in AMD’s favor and AWS Trainium 3 plus Google TPU v6 are real share absorbers at the hyperscaler level. “Monopoly” is the wrong word for 2027 — “dominant incumbent with credible competition emerging” is more accurate.

What is Rubin?

Rubin is NVIDIA’s next major architecture after Blackwell. Expected to sample in early FY2027 and ship at volume in late FY2027, it moves to a new die on TSMC N3P (some reports suggest N2 for a Rubin Ultra variant), uses HBM4, and reportedly includes a new NVLink generation (NVLink6) with higher per-link bandwidth. Jensen has discussed Rubin publicly at GTC; specifics remain unconfirmed.

When does Blackwell Ultra ship in volume?

It is shipping in volume now. Q2 FY26 was the first full quarter of meaningful B300 / GB300 NVL72 deliveries, and our reconstruction (chart above) suggests volume continues to ramp through Q4 FY26 and peaks in Q2 or Q3 FY27 before Rubin transitions begin.

Can AMD MI400 catch up?

Catch up to NVIDIA’s training share? Unlikely within the next two product generations. But MI400 is good enough to win a meaningful share of the second-source socket at every major hyperscaler, which is itself a multi-billion-dollar opportunity. The bull case for AMD is not “beats NVIDIA” — it is “becomes the credible second source that every hyperscaler is required to qualify for capex risk management.”

What happens to NVIDIA if hyperscaler capex normalizes?

This is the big macro question. If 2027 hyperscaler AI capex grows 10 percent instead of 40 percent, NVIDIA’s revenue growth rate compresses sharply. The mitigants are sovereign demand, networking attach (less correlated with capex cycles), software (DGX Cloud, NIM microservices, AI Enterprise), and inference workloads which scale with end-user adoption rather than training capex. But the equity story would re-rate.

How dependent is NVIDIA on TSMC?

Almost entirely. The Samsung Foundry option for high-end GPUs has been discussed for years and never materialized; Intel Foundry Services is too immature for leading-edge GPU production. TSMC concentration is the single largest non-policy risk in the model.

Further reading

Sources and references

Earnings figures are drawn from NVIDIA’s Q2 FY2026 investor materials at investor.nvidia.com and the prepared remarks transcript. Analyst consensus and segment estimates synthesized from Bernstein Research (Stacy Rasgon), Morgan Stanley AI Infrastructure team (Joseph Moore), and supply chain commentary from TrendForce and DIGITIMES. Export-control context drawn from BIS final rule publications and FT/Reuters reporting on the AI Diffusion framework. Customer-side capex commentary from Q1 CY2026 earnings calls for Microsoft, Alphabet, Meta, and Amazon.


Author

Riju M P is a PLM and industrial-digital practitioner who writes about how the AI infrastructure cycle reshapes engineering, manufacturing, and product-lifecycle systems. Industry analysis at IoTDigitalTwinPLM.com focuses on the points where compute, software, and phys

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *