Data processing units (DPUs) dominated the silicon conversation in 2021 as countless chipmakers embraced the three-letter acronym. However, the fledgling technology remains mired in misconceptions.

For the uninitiated, DPUs are a broad category of hardware accelerators designed to offload input/output intensive workloads, such as those associated with networking, storage, and security.

But while the form factor, underlying technology, and the role these devices serve vary wildly from vendor to vendor, most share a common promise: more efficient resource utilization and therefore reduced operating costs throughout the data center.

SDxCentral spoke with Nvidia and Fungible, two pioneers in the emerging DPU market, about some of the biggest misconceptions and challenges facing the technology.

The True Cost of DPUs

Cost is one of the biggest misconceptions about these devices, according to Kevin Deierling, SVP of Nvidia’s Networking business unit.

“The GPU is really good at running accelerated applications that are massively parallel; the CPU is really good at running traditional, single-threaded applications; and the DPU is really, really good at running encryption, and data packet processing, and switching, and networking, and storage applications,” he told SDxCentral.

It comes down to using the best processor for the job, Deierling added. “You can run everything on the x86 [the CPU], but if you do that you will have massive inefficiencies and it will cost you a ton of money.”

So despite DPUs higher cost relative to traditional NICs, the added expense is more than offset by the efficiencies they enable, he claimed.

Because those workloads are offloaded to the DPU, the CPU is then free to run revenue-generating workloads. Or, in the case of Fungible’s storage appliances, these capabilities open the door to eliminating costly CPUs entirely.

“A server in our view really should be CPU and RAM and everything else can be composed into that server,” Toby Owen, VP of Product at Fungible, told SDxCentral. “We’re really trying to bring that cloud experience back into the data center.”

Custom Silicon? Arm Cores? FPGAs?

Another point of confusion surrounding DPUs is their underlying technology. Some DPUs are built around a set of Arm processor cores that are supplemented by application-specific accelerators, while others are based on FPGAs, which can be configured for a specific use case.

While certain workloads may favor FPGAs, Deierling argues “they are really different beasts,” and that configurability is different from programmability.

“There’s a lot of people who can write Rust, and GO, and C, and C+/+, and Erlang, [and Python,] and all of the languages that run on a Bluefield GPU running on a programmable Arm processor,” he said, adding that configuring an FPGA is an “arcane skill” by comparison.

Nvidia’s BlueField-series DPUs pair dedicated accelerator blocks for networking, encryption, storage, and security workloads, with a general-purpose Arm CPU.

This approach isn’t without its compromises, Owen noted. “If you’re trying to do something it was designed for, it’s going to work really fast," he explained. "But if you’re trying to do an offload that wasn’t in the pipeline originally, you’re going to have a lot of latency as you go back and forth with the general purpose Arm core.”

To alleviate this challenge, Fungible’s DPUs keep the accelerator blocks but swap the Arm cores for a custom architecture that it claims melds the performance of an FPGA with the flexibility of a CPU.

“Our architecture is such that if we want to do a storage data flow, we can program [our DPUs] to pipeline the entire operation and handoff seamlessly from core to accelerator,” Owen said.

Software Challenges Persist

On the topic of programmability, one of the biggest barriers to DPU adoption remains software compatibility. Software has to be optimized to take advantage of these specialized accelerators, Deierling said.

To address this, most DPU vendors have introduced software development kits (SDKs) targeting independent software vendors. Nvidia’s DOCA environment is one example.

These SDKs enable developers to bake DPU support into their software the same way vendors added support for GPU accelerated compute in the past using Nvidia’s CUDA SDK. “DOCA is to DPUS what CUDA is to GPUs,” Deierling said.

Nvidia has seen some early successes attracting developer support to its platform. A collaboration with VMware, dubbed Project Monterey, enabled customers to offload the ESXi hypervisor to the company’s BlueField-2 DPUs. Similarly, Palo Alto Networks recently optimized their virtual firewalls to run on Nvidia’s DPUs.

Fortinet, Juniper, F5, Red Hat, and Canonical have also announced support for Nvidia’s BlueField DPUs and DOCA SDKs.

Fungible, in contrast, has taken a narrower workload-specific approach that targets high-performance storage and composable infrastructure use cases.

“In terms of the ecosystem, we’re thinking about aligning to workloads because that’s more consumable and it’s much better to solve the whole problem,” Owen said.

“I think the value proposition of being able to offload general purpose work from a CPU is certainly interesting in some use cases, but that’s not our goal,” he added. “Our goal is to use the servers you already have and just make them work better.”