SDxCentral CEO Matt Palmer speaks with AMD CVP Soni Jiandani about the networking challenges, AI and the role of DPUs.

What’s Next is a biweekly conversation between SDxCentral CEO Matt Palmer and a senior-level executive from the technology industry. In each video, Matt has an informal but in-depth video chat with a fellow thought leader to uncover what the future holds for the enterprise IT and telecom markets — the hook is each guest is a long-term acquaintance of Matt’s, so expect a lively conversation.

This time out, Palmer spoke with Soni Jiandani, CVP of AMD's Networking Technology and Solutions Group. As the cofounder of Pensando Systems as well as a long background at Cisco, among others, Jiandani has a long, storied history in networking. In other words, she's been disrupting markets and driving industry transformation for more than 25 years!

Editor’s note: The following is a summary of what Jiandani shared in their conversation, edited for length. To hear the full conversation, be sure to watch the video.

Network challenges today

Soni Jiandani: It's very exciting times in networking, particularly with a lot of innovation in the data center around sustainability, performance and efficiency. Today's data center operators are facing multiple performance- and efficiency-related challenges. For example, demand for new age applications is driving increased capacity demands. Rising energy consumption and cost, physical power, real estate constraints and carbon emission regulations are being put in place. So how do you balance bringing those technology trends with sustainability and drive your data center to be ready for the new age applications?

Scale is another important factor whereby the modern data center has to accommodate the company's increasing demand for scalability at varying magnitudes. Data Center have to scale up or down vertically and scale out horizontally to manage the increased demands and daring workloads.

Security is now a board-level mandate for many customers. It's moved from being one of your Top 10 requirements to a Top 3. It's no longer just the government or banking and health care needs. It's ubiquitous across all industries.

And finally, while AI is new, it's here to stay...and things are moving very fast in that sector. Over the years we have had new technologies that have driven demand for one or two technology areas. AI is pushing the boundaries of every aspect of data center today, whether it's in the cloud or whether it's for the enterprise or whether it's for the service providers.

AI for infrastructure

Jiandani: AI is going to have a dramatic impact on networking technology. It is ultimately changing everything. Your success with AI will depend on whether your infrastructure is able to support such powerful applications and the demands it makes. And while the cloud is emerging as a major resource, many enterprises have to also rely on the on-premises environments to accommodate for things like big data storage. Do I have enough? Can it scale with my models and AI-generated data from a networking infrastructure point of view? Can I or my network itself through automation, anticipate what are going to be the demands of networking, including security threats, and react in real time? Compute what applications are relying on my compute resources versus genAI-based applications, which will require the right amount of GPU resources that I now need to apply.

And it's not just about that, but also, how am I using those assets efficiently? As I move to this genAI-type of environment and security in general, compliance is also going to play a very big role. Do I have the right data management and governance controls in place? Not just the data that I'm analyzing, but also the data that I'm generating with these large language model (LLM) in the cloud. How do I guarantee that there is no data leakage when I go to the cloud to process these LLM? So from an AI infrastructure standpoint, companies have to look at this multi-dimensionally across their networking storage data, analytics and security platforms to make sure they're able to effectively deal with the growth of the business and the AI ecosystem, including the data generated internally as well as those generated through partnerships and the supply chain.

Needles in the haystack

Jiandani: If your audience is a hyperscaler or a Tier. 2 cloud customer, they want to have the ability to utilize domain-specific architectures today in DPUs, to accelerate their networking capabilities and have the ability to build a network architecture that can accommodate the new AI workload simultaneously, as opposed to being locked into proprietary and networking technologies like InfiniBand. Nobody wants to go back in time.

If you bet against Ethernet and IP, you would lose anyone in the networking industry. Or ATM, even before that. But I think if you were to take a step back, the biggest problems we are solving for our cloud customers today for DPUs with DPUs in production happens to be, how do I scale my networks? How do I simultaneously turn on security policy? How do I simultaneously turn on encryption in that environment?

And while I'm turning on encryption and software-defined networking and accelerating it and doing it at scale, how do I increase the number of connections per second without compromising on the scale of my network? Because I'm a cloud service provider, I need to do everything at scale. Everything is in tens and millions of flows. How do I have the ability to troubleshoot in real time without having the ability to bring the cloud down, because it's a 7 x 24 thing, right? So I need visibility in order to understand where the problem lies because I'm trying to find a needle in a haystack.

And having the ability to impose more distributed architectures or scale-out architectures with NVMe over transmission control protocol (TCP), because now, as I distribute my workloads, I want the ability for for the storage also to be a distributed asset. I cannot be focused on it being centralized. How do I run these services? All in the data processing unit (DPU)? Do I bring up my CPU resources? So those become monetizable elements, and I am assured of a secure fabric that I'm building natively through my DP.