DPUs and SmartNICs in 2026: How NVIDIA BlueField, AMD Pensando, Intel IPU, and AWS Nitro Are Offloading the Data Center to a New Class of Networking Silicon
- Internet Pros Team
- May 16, 2026
- AI & Technology
Inside every modern AI data center, a quiet but consequential reorganization is finishing. The networking, storage, security, and virtualization work that used to consume 20 to 40 percent of a server's CPU cycles has been lifted off the host processor entirely and shifted onto a new class of accelerator card sitting in the PCIe slot. Data Processing Units (DPUs) and the broader category of SmartNICs — NVIDIA BlueField-4 and ConnectX-8 SuperNIC, AMD Pensando Salina, Intel IPU E2200 (Mount Morgan), AWS Nitro v6, Microsoft Azure Boost, Google IPU, and Marvell OCTEON 10 — are the third pillar of compute in 2026, joining CPUs and GPUs as a permanent fixture in any serious AI or hyperscale architecture. They are how a 400 Gbps RDMA link becomes a useful network, how a multi-tenant cloud isolates one customer from another at line rate, and how a 100,000-GPU AI cluster does an all-reduce without melting its servers.
Why Servers Could No Longer Do Their Own Networking
The original network interface card moved bytes onto and off a wire. That was enough when networks ran at 1 Gbps and a single x86 core could comfortably keep up. By the time data centers crossed 100 Gbps, the cost of running the kernel TCP/IP stack — every packet copy, every interrupt, every context switch — was eating entire cores. At 400 and 800 Gbps, the math became absurd: a software stack running on a host CPU simply cannot process line-rate traffic without dedicating a stunning fraction of the machine to the network itself. Hyperscalers call this the "datacenter tax", and AWS, Microsoft, and Google have publicly stated it consumed 25 to 40 percent of every paid VM's cycles before they offloaded it.
DPUs solve the problem architecturally. The host CPU stops running the network, storage, and security stack altogether. A separate SoC on the NIC — typically built around ARM Neoverse cores, a P4-programmable packet pipeline, and a tall stack of dedicated crypto, compression, and RDMA engines — takes over the entire infrastructure plane. Every CPU cycle reclaimed from networking is now a cycle sold as compute.
"The first time we measured a 32-core server after moving OVS, the encryption, and the storage path onto a BlueField, we got 30 percent more usable VM density at the same SKU. There is no software optimization on Earth that gives you that number."
What a DPU Actually Does
A modern DPU is best understood as a tiny, dedicated server welded to a network port. It runs its own Linux kernel, its own management plane, and its own attestation root of trust — completely isolated from the host. The functions it absorbs fall into four buckets.
Networking Offload
OVS, VXLAN/Geneve overlays, EVPN, microsegmentation policy, load balancing, congestion control, and in-network telemetry all execute on the DPU's P4 pipeline at full 400-800 Gbps line rate, with zero host CPU involvement.
Storage Offload
NVMe-oF initiators and targets, AES-XTS encryption, compression, and erasure coding move into the DPU. The host sees a local NVMe device that is actually backed by remote storage hundreds of microseconds away — with no CPU cost.
Security Offload
Stateful firewalls, TLS 1.3 termination, IPsec, MACsec, deep packet inspection, and zero-trust microsegmentation policy run on dedicated crypto engines. The DPU enforces tenant isolation that the host CPU is no longer trusted — or fast enough — to police.
Virtualization Offload
vhost, virtio-blk, virtio-net, and the hypervisor data plane move to the DPU. AWS Nitro pioneered this in 2017; in 2026 every major cloud — Azure Boost, GCP IPU, Alibaba CIPU — has copied the pattern. The host runs only customer workloads.
The Vendor Landscape in 2026
| Vendor / Product | Headline Spec | Where It Wins |
|---|---|---|
| NVIDIA BlueField-4 + ConnectX-8 SuperNIC | 800 Gbps Ethernet/InfiniBand, 64 Arm Neoverse cores, DOCA 3.0 software stack | The default AI-factory DPU. Paired with Spectrum-X800 and Quantum-X800 switches, BlueField-4 is the east-west fabric controller for every NVIDIA-reference AI training cluster from GB200 NVL72 upward. |
| AMD Pensando Salina | 400 Gbps, P4-programmable, 16 Arm N1 cores, Pollara 400 Ultra Ethernet variant | Sells through HPE ProLiant, Dell PowerEdge, and the Aruba CX 10000 smart switch. Pensando is the leading non-NVIDIA option for enterprise zero-trust microsegmentation and is anchoring AMD's entry into AI cluster networking via the Ultra Ethernet Consortium. |
| Intel IPU E2200 (Mount Morgan) | 800 Gbps, co-developed with Google for hyperscale offload | Powering Google's next-generation IPU-based servers and shipping to enterprise OEMs under the IPU brand. Intel's answer to BlueField, focused on hyperscalers willing to write directly against an open IPU SDK rather than DOCA. |
| AWS Nitro v6 | Custom Annapurna Labs ASIC, 800 Gbps EFAv4 networking, full hypervisor offload | The original production DPU. Nitro is why an AWS bare-metal instance has zero hypervisor overhead and why Outposts can ship a piece of AWS into a customer's rack with the same security model. Every EC2 instance in 2026 sits behind Nitro. |
| Microsoft Azure Boost + MANA NIC | 200-400 Gbps, FPGA + custom ASIC, Catapult lineage | Azure's answer to Nitro. Boost offloads storage, networking, and host management onto a dedicated card, freeing Azure VMs from the historical Hyper-V tax and enabling consistent 200 Gbps Accelerated Networking across the fleet. |
| Marvell OCTEON 10 / Tahoe | 400 Gbps, 24-core Neoverse N2, deep telco and security feature set | The merchant DPU of choice for 5G UPF offload, vRAN, security appliances, and white-box networking. Marvell is the workhorse silicon inside many telco DPUs that never carry a brand name on the front panel. |
DPUs and the AI Factory
Every AI training cluster of meaningful scale in 2026 — NVIDIA GB200 NVL72, AMD MI300X pods, Google TPU v5p superpods — is bottlenecked not by FLOPS but by the network between accelerators. A single training step on a frontier model involves terabytes of gradient data flowing through all-reduce collectives. The DPU is what makes that fabric work. NVIDIA's SHARP protocol pushes the actual reduction operation into the network, executed in BlueField and Quantum switch silicon as packets traverse the fabric — eliminating a full round-trip and shaving 30 to 40 percent off all-reduce latency. AMD's Pollara and the broader Ultra Ethernet Consortium are racing to replicate this on standards-based Ethernet. Without DPU-class hardware in every server, an AI cluster does not scale past a few thousand GPUs; with it, six-figure GPU counts become routine.
What DPUs Mean for Infrastructure Buyers in 2026
- You are buying a DPU whether you know it or not. Every modern AWS, Azure, GCP, and Oracle instance already runs behind one. The decision is whether your private infrastructure follows the same model.
- Zero-trust microsegmentation finally runs at line rate. Enforcing per-workload policy on a 400 Gbps link is physically impossible for a host firewall. A DPU does it in P4 at wire speed, with no CPU tax and no measurable jitter.
- Storage and compute disaggregate cleanly. With NVMe-oF offload on the DPU, remote NVMe arrays look local to the host. The era of stranded direct-attached SSDs is ending.
- AI cluster economics depend on the fabric. A 100,000-GPU cluster without SHARP-class in-network compute will spend most of its time waiting on all-reduce. The DPU choice is now an AI cluster design decision, not a networking afterthought.
- Watch the power budget. Modern DPUs draw 75-150W each — a non-trivial line item when multiplied across 50,000 servers. Liquid-cooled rack designs increasingly assume DPU thermals from day one.
Where the DPU Goes Next
The 2027 horizon points in three directions. First, 1.6 Tbps SmartNICs arrive as 224 Gbps PAM4 SerDes ship in volume, paired with co-packaged optics to keep power per bit flat. Second, the DPU becomes the natural home for confidential computing — attestation, key management, and tenant isolation enforced by a card that the host CPU cannot tamper with, complementing GPU TEEs from NVIDIA H100/B200 and AMD MI300X. Third, expect in-NIC inference: small transformer and embedding models running on the DPU's Arm cores for use cases like packet-level threat classification, RAG retrieval acceleration, and AI-augmented load balancing — a category NVIDIA, Marvell, and several startups are openly building toward.
For thirty years, the CPU did everything. The 2020s split the bill: GPUs took the compute, and DPUs took the infrastructure. By 2026, any new server design that does not budget for both is an outlier — and any AI cluster without a programmable, line-rate DPU in every rack will simply not perform. The network became an accelerator, and the accelerator became part of the network.