AWS first to offer Nvidia GH200 chips in AI supercomputing platform

LAS VEGAS – Nvidia and Amazon Web Services (AWS) are now offering joint supercomputing infrastructure, software and services optimized for generative artificial intelligence (genAI) use cases. The hyperscaler claims this will be the first cloud artificial intelligence (AI) supercomputing platform built with Nvidia Grace Hopper chips and the scalability of the AWS cloud.

The stars of this supercomputing offering are Nvidia's GH200 Grace Hopper Superchips. These GH200 chips are made up of an Arm-based Grace CPU and Nvidia Hopper architecture GPU on the same module. Combining with Nvidia's multi-node NVLink32 platform allows IT teams to connect 32 of these chips into one instance. The platform will be available on Amazon Elastic Compute Cloud (EC2) instances and supported by the hyperscaler's virtualization and hyperscale clustering capabilities.

Specifically, the platform offers 400 Gb/s per GH200 chip of networking throughput. This speed helps customers scale to thousands of Nvidia chips on Amazon EC2. And the offering boasts 4.5 Tb of memory, which is seven-times more than the current generation of H100-based EC2 instances. In addition, this latest generation's CPU-to-GPU memory interconnect provides seven-times more bandwidth, which augments chip-to-chip communication and increases the overall available memory.

This collaboration also marks the first AI infrastructure on AWS to be liquid-cooled. The sustainability-minded cooling technology prevents dense server racks from sacrificing performance to high physical temperatures, according to the vendors.

Nvidia L40S GPUs on AWS cloud accelerate reliable AI robotics

Also at AWS re:Invent 2023, Nvidia and AWS announced the availability of Nvidia L40S GPUs on AWS cloud infrastructure – or in AWS data centers, depending on how you think about it. This partnership is designed to accelerate the process of building and deploying robotics applications on AWS, especially when in conjunction with Nvidia's Isaac Sim platform, Nvidia VP of Hyperscale and High-Performance Computing (HPC) Ian Buck told reporters on Monday.

Buck cited Amazon Robotics, along with Soft Robotics and Theory Studios, as the first organizations "to benefit from the new L40S GPU instances." Amazon Robotics, for example, has more than 750,000 robots deployed throughout its fleet of data centers and warehouses.

"That's a lot of robots," Buck said. "Training, building and designing these robots to be reliable is a heavy lift." With the availability of Nvidia L40S GPUs, Amazon Robotics aims to speed up the development of its Proteus autonomous mobile robot designed to manage fulfillment on the Amazon side of the house.

And by combining the GPU technology with Nvidia's Isaac Sim platform, which is built on the Nvidia Omniverse development platform, roboticists can simulate realistic sensor data in a virtual environment for the purpose of testing and validating thousands of scenarios, all before a robot moves an inch. Combined with the L40S GPUs – a versatile "GPU workhorse" optimized for artificial intelligence (AI) computing and generative AI (genAI) capabilities – the Isaac Sim platform's simulation performance and throughput are both double that of the previous A40 GPU, according to Buck.

"Simulation technology plays a critical role in how we develop, test and deploy our robots," according to Amazon Robotics Head of Virtual Systems Brian Basile. "We continue to increase the scale and complexity of our simulations. With the new AWS L40S offering, we will push the boundaries of simulation, rendering and model training even further," he said.

In addition to robotics applications, Nvidia L40S GPUs on AWS can help IT teams tackle the genAI wave by fine-tuning large language models (LLMs) in just a few hours and perform real-time inferencing for text-to-image and chatbot applications.

AWS first to offer Nvidia GH200 chips in AI supercomputing platform

Tags

AI Data Centers: Scaling Up and Scaling Out

Advanced Networks for Artificial Intelligence and Machine Learning Computing

DCD>Survey: Data center networking trends

Future-proof your datacenter with DDC S-Series