NVIDIA GTC 2026: The Groq 3 LPU, Vera Rubin, and Why the Inference Economy Changes Everything for AI Builders
AI & Tech

NVIDIA GTC 2026: The Groq 3 LPU, Vera Rubin, and Why the Inference Economy Changes Everything for AI Builders

The Lead Jensen Huang took the stage at GTC yesterday and did what he does best: made the future feel inevitable. But the headline number wasn't a benchmark score or a chip spec. It was $1 trillion.

The Lead

 

Jensen Huang took the stage at GTC yesterday and did what he does best: made the future feel inevitable. But the headline number wasn't a benchmark score or a chip spec. It was $1 trillion. 


That's NVIDIA's updated projection for purchase orders between Blackwell and Vera Rubin through 2027, double the $500 billion estimate from last year. 


If you're wondering whether the AI infrastructure boom is real or a bubble, NVIDIA just told you what their order book says.

The Deep Dive: NVIDIA Just Showed You the Agent Stack

Yesterday's column laid out what we expected from GTC: Vera Rubin, NemoClaw, a glimpse of Feynman, and a clear signal that NVIDIA sees the future in inference. What we got was all of that, plus a surprise that reframes the whole picture.


Let's start with the surprise: the Groq 3 LPU.


Three months after NVIDIA's $20 billion acqui-hire of Groq, the first product is already here. The Groq 3 is a dedicated inference co-processor, purpose-built for the decode phase of model inference, the bandwidth-intensive part where tokens actually get generated. Each chip packs 500 MB of SRAM delivering 150 terabytes per second of memory bandwidth. That's nearly 7x faster than the HBM4 on NVIDIA's own Rubin GPUs. The tradeoff is capacity (you need 256 of them in a rack to get to 128 GB), but for pure token generation speed, nothing else comes close. NVIDIA is claiming 1,500 tokens per second for agentic workloads and 35x higher inference throughput per megawatt.


Here's why this matters for anyone building agent systems: the Groq 3 isn't replacing GPUs. It's sitting next to them. The Rubin GPU handles the compute-heavy prompt processing (prefill), and the LPU handles the bandwidth-heavy response generation (decode). It's a disaggregated inference architecture, and it's exactly the kind of specialization that the agentic era demands. When you're running multi-step reasoning chains where an agent makes dozens of tool calls in sequence, the decode latency on each call is what determines whether your workflow feels instant or painful.


I've been living this problem with Scorpiox. Every orchestration cycle, every handoff between agent sessions, every tool call burns tokens and waits for decode. When I wrote yesterday that the bottleneck isn't model intelligence but the economics of keeping agents running persistently, this is the hardware answer. A purpose-built decode accelerator that sits alongside the GPU and just spews tokens. It's not elegant. It's practical. And practical is what ships.


Vera Rubin itself arrived as expected: a full-stack platform comprising seven chips, five rack-scale systems, and one supercomputer architecture. The 288GB of HBM4 per Rubin GPU and the promised 10x cost-per-token reduction over Blackwell confirm the inference-first thesis. But the real story is that Vera Rubin isn't a chip. It's a vertically integrated system, from silicon to software, designed to be deployed as a unit. NVIDIA is selling AI factories, not components.

Then there's NemoClaw and OpenShell, NVIDIA's play to own the agent orchestration layer. Here's where the picture gets interesting. OpenClaw, the open-source agent platform that's become the fastest-growing GitHub project in history, was all over the keynote. Jensen called it "the most popular open source project in the history of humanity" and said every company needs an OpenClaw strategy. NemoClaw is NVIDIA's enterprise wrapper: policy enforcement, network guardrails, and privacy routing that makes OpenClaw safe for Fortune 500 deployment. OpenShell provides the runtime.

This is a smart wedge. OpenClaw has exploded because it lets anyone stand up a persistent, always-on AI agent that can actually do things: manage email, browse the web, execute multi-step workflows through messaging apps. But enterprise IT can't deploy something with that much surface area without governance. NemoClaw is the governance layer, and by providing it, NVIDIA positions itself as the enterprise on-ramp for the entire agent ecosystem. If you're building agent infrastructure (as I am), pay attention to what NemoClaw's policy engine looks like in practice. That's the integration surface your tools will need to speak.


Feynman, the next-generation architecture after Vera Rubin, got a brief but revealing preview. It introduces a new CPU called Rosa (named for Rosalind Franklin), pairs with the next-gen LPU (LP40), and is built for sustained, long-context, multi-step reasoning. Yesterday I described it as an "inference-first chip on TSMC's 1.6nm process." The reality is broader: Feynman is a full platform redesign that assumes inference and agentic compute are the primary workload, not an afterthought. It's a 2028 bet, but it tells you where NVIDIA's architects are spending their time.


One more number worth sitting with: NVIDIA reported $193.5 billion in data center revenue for fiscal 2026, up from $116.2 billion the prior year. With hyperscalers planning $650 billion in AI capex this year, the infrastructure buildout isn't slowing down. It's accelerating.


Also Worth Knowing

IBM closed its $11 billion acquisition of Confluent today. The deal, announced in December, gives IBM a real-time data streaming platform used by over 6,500 enterprises. The timing with GTC is no accident: IBM also announced an expanded collaboration with NVIDIA at the conference. The thesis is that AI agents need continuous, governed, real-time data, not batch reports from yesterday. If you're running an enterprise that's serious about agent deployment, the data plumbing layer just became someone else's problem. Whether that's a good thing depends on how you feel about IBM's track record with acquisitions.


Autonomous driving had a big moment. Jensen declared the "ChatGPT moment of self-driving cars has arrived" and announced that BYD, Hyundai, Nissan, and Geely are building Level 4 autonomous vehicles on NVIDIA's Drive Hyperion platform. Uber will deploy NVIDIA-powered robotaxis across 28 cities on four continents by 2028, starting with LA and San Francisco next year. The physical AI thesis is moving from demos to deployment timelines.


The Nemotron Coalition launched. NVIDIA rallied partners including LangChain, Perplexity, and Mistral AI around six open frontier model families covering language, vision, robotics, autonomous driving, biotech, and climate. NVIDIA is positioning itself not just as a chip company but as the open-model ecosystem anchor for the entire AI stack.


The Builder's Take

I spent yesterday previewing GTC with a thesis: the constraint is shifting from capability to deployment economics and integration complexity. Twenty-four hours later, I feel even more confident about that read, but I want to sharpen it.

What NVIDIA showed yesterday wasn't just faster chips. It was a full-stack opinion about how AI work gets done. GPUs for training and prefill. LPUs for decode. CPUs purpose-built for agentic workloads. NemoClaw for governance. OpenShell for runtime. Vera Rubin as the integrated factory. This is NVIDIA saying: we don't just sell you the engine anymore, we sell you the car, the road, and the traffic system.


For builders, that's both an opportunity and a warning. The opportunity is that the cost floor for running agent systems is about to drop dramatically. The workflows you shelved because the unit economics didn't pencil out at $0.03 per call? They pencil at $0.003. That changes everything for persistent, always-on agent architectures.


The warning is platform dependency. When one company provides the GPU, the LPU, the CPU, the networking, the storage, the agent runtime, and the governance layer, you're building on a monoculture. That's efficient right up until it's not. Smart builders will take advantage of the cost reduction while keeping their orchestration layer portable. Build on the economics, not on the proprietary APIs.


Jensen envisions every engineer getting an annual "token budget" the way they get a laptop budget today. I think he's right about that. And the companies that figure out how to spend that budget effectively, how to decompose work, wire agents into real systems, and govern the output, will be the ones that win.


The chips are here. The plumbing is here. The question, as always, is whether your organization can actually use them.



Keep building,

— JW