Spirent Communications recently launched a high-density test solution for simulating realistic artificial intelligence (AI) workloads over Ethernet. The offering uses Spirent’s A1 400 Gb/s platform, can emulate 400G xPU workloads, and assess Ethernet infrastructures in AI environments.
Ethernet serves as the backbone of cloud technology. Its ability to handle the extreme demands of AI traffic – characterized by high workload volumes and sensitivity to latency and congestion – is vital. Traditional data centers might tolerate some packet loss or latency, but these issues can completely halt the training processes in AI-powered data centers, which require more robust testing mechanisms.
The unique challenges in AI networking have often been overshadowed by GPUs, which are just one part of the equation. Successful AI operations require fast processors, fast data storage, and efficient networking to ensure all components operate harmoniously. Spirent’s offering looks to address this by supporting AI and traditional routing/switching tests using the remote direct memory access over converged Ethernet version 2 (RoCEv2) protocol.
Spirent over the past 10 months has shifted its focus from testing to a deeper engagement with market trends, specifically in Ethernet technologies, according to Aniket Khosla, VP of wireline product management at Spirent. Major tech players have rapidly expanded their backend data center operations, creating additional complexities. One of them is the extensive use of GPUs, which when congested lead to significant idle times due to network bottlenecks.
Spirent’s offer incorporates simulation techniques using collective communication libraries (CCL) to break down large data chunks into smaller, manageable packets, simulating real-world AI traffic scenarios. This approach tests the network’s capacity to handle AI workloads and its ability to manage data transmission patterns.
“We’re not testing the GPU or other components. We test the Ethernet fabric,” Khosla said. “Our proposition is lower cost. We take the complexity out of the use cases and we provide test repeatability.”
Khosla highlighted the discrepancy in performance between lab tests using competitors’ equipment and real-world deployments. While other tests might look successful in controlled environments, they often fail to replicate the actual demands of live AI operations. Spirent’s platform emulates congestion, packet delivery issues, and latency scenarios, which can more accurately reflect a network’s performance under AI workloads.
Khosla also noted the economic and operational challenges of maintaining GPU clusters, with high costs and GPU scarcity as major hurdles. Spirent’s solution mitigates these issues by providing a testing environment that simulates various aspects of AI operations. Khosla said this reduces the need for expensive, real-world setups and allows organizations to reallocate their resources more efficiently.
Spirent is looking to enhance its capabilities to support higher traffic levels up to 800G to help enterprises meet AI networking demands. These include maintaining large-scale AI data centers; measuring network responses to new protocols and workloads for AI applications; and training networks for successful AI implementation.
“This is our first foray into the AI testing world,” Khosla said. “The product that we’re putting out right now is regular Ethernet. There is still work happening in telemetry, but it will be some time before this comes to fruition both from the standards bodies and the players around it. From a product introduction perspective, we’re pretty complete. Six months from now, our roadmap is probably going to change. We will make shifts as the market demands it.”