The adoption of artificial intelligence (AI) accelerators in data centers could be stymied if existing hardware bottlenecks aren't addressed, warned Dell'Oro analyst Baron Fung, in a blog post this week.

"AI is driving the need for specialized solutions at the chip and system level in the form of accelerated compute servers optimized for training and inference workloads at the data center and at the edge," he wrote.

This specialized AI hardware is helping to accelerate workloads like image and speech recognition, security, and predictive analytics.

And while Fung admits that AI accelerators are only being used in a fraction of global data centers today, the market is expected to grow at a "double-digit compound annual growth rate over the next five years."

However, it isn't as simple as deploying more AI accelerators without first addressing the existing bottlenecks, he wrote, identifying four areas where developments are needed to realize the full potential of AI accelerators: rack architectures, networking, CPUs, and memory density.

Denser Accelerator Arrays Mean More Heat

Fung notes that AI accelerators are increasingly being deployed in a centralized rack architecture where multiple GPUs or other AI accelerators are housed within a single server. Nvidia's DGX A100 servers, which were announced in April, are one such example. Each system houses eight A100 GPUs.

According to Fung, pooling large amounts of resources in a single server and virtualizing the hardware enables multiple users to run workloads at the same time. "Nvidia's recently launched A100 Ampere takes virtualization to the next step with the ability to allow up to seven GPU instances with a single A100 GPU," he wrote.

While this increases compute density by a wide margin over more traditional distributed architectures, where there is one GPU per server, it also introduced considerable challenges for thermal dissipation. And getting rid of all that heat may mean changing the form factor, power distribution, or cooling regime, Fung wrote, adding that some vendors have opted for liquid cooling, as is the case with Google's tensor processing unit accelerator.

Faster Networking Fabrics

Fung explained that as data centers shift to centralized compute architectures, high-speed network fabrics become essential. Nvidia's NVlink and InfiniBand — which Nvidia acquired with the purchase of Mellanox for $6.9 billion last year — are two such fabrics currently being deployed to cope with the mountains of unstructured data generated and processed in AI data centers.

However, Fung argues that 400 Gb/s Ethernet is now the ideal choice for bridging storage and compute nodes with the network fabric.

"I believe that these accelerated compute servers will be the most bandwidth-hungry nodes within the data center, and will drive the implementation of next-generation Ethernet," he wrote.

Beyond Ethernet, Fung also believes that smartNICs could be used to minimize packet loss, optimize traffic, and scale storage devices within networks using NVMe-over-fabrics. To that end, Nvidia, earlier this month, unveiled its BlueField-2 and BlueField-2X data processing units (DPUs), the latter of which combines a smartNIC with an Ampere-based GPU for accelerating AI workloads.

Don't Forget About the CPU

It's important to remember that these AI accelerators don't replace the CPU, Fung wrote. "The CPU can be viewed as the taskmaster for the entire system, managing a wide range of general-purpose computing tasks, with the CPU and other accelerated processors performing a narrower range of more specialized tasks."

Because of this, Fung argues that it's imperative to balance the number of CPU cores with the number of GPUs for a given workload.

Advancements from Intel and AMD are helping to better address AI workloads as well. Intel's third-generation Xeon Scalable processors are the chipmaker's second to feature integrated AI acceleration.

AI the Memory Hog

AI workloads are also pushing the boundaries of memory density. Citing a panel with Google, Marvell, and Rambus during the AI Hardware Summit late last month, memory quantity and performance is becoming a limiting factor for scaling AI training models.

"More memory capacity means more modules and interfaces, which ultimately degrades chip-to-chip latencies," Fung wrote. "There needs to be an optimal balance between memory bandwidth and capacity within the system, while adequately addressing thermal dissipation challenges."

Two potential remedies include three-dimensional stacking, to package memory chips closer together, and high bandwidth memory, which is already being used to minimize latencies, albeit at a cost premium.

Regulator Bottlenecks Ahead?

AI adoption may face new bottlenecks, according to The Wall Street Journal, which reported Wednesday that the Trump administration is close to completing guidance on how to regulate the technology.

The regulation seeks to help government agencies take advantage of AI while limiting its potential misuse, in what Michael Kratsios, senior technology official for the Trump Administration, calls a "light touch" approach. Kratsios argues that stiffer regulation, like those proposed by the European Union, could stifle AI development in the U.S.