Raw speed and low latency aren’t the only thing that networks can contribute to machine learning.
Mellanox, which has done a lot of work in high-performance computing (HPC), has been adding some processor-aiding features to its switches. Some of those advances turn out to be useful in machine learning, too.
Specifically, a technology called Scalable Hierarchical Aggregation Protocol (Sharp) lets a switch do some of the data handling that would otherwise be a server CPU’s job. Switches can already handle some processing tasks — they can manipulate packet headers, for instance. Sharp goes a step further by letting the switch manipulate data.
In machine learning, this can come in handy for training the neural network. Machine learning relies on giving the processor lots of examples to learn from. Sometimes, this training involves assigning multiple GPUs to a task and combining the results.
True to its name, Sharp lets the switch do the aggregation. “As the data moves within the fabric, the network devices are acting as the CPU,” says Gilad Shainer, Mellanox’s vice president of marketing.
In this way, the compute function can be distributed around the network, “and that’s critical for running the training exercise faster,” Shainer says.
Sharp is only available for Infiniband deployments; it’s a feature on the Mellanox Switch-IB 2 family of systems. So Sharp isn’t a factor in Mellanox’s recent contract with Baidu, which involves building a 100-Gb/s network to support machine learning research.
Ethernet dominates most networks, but Infiniband is still useful in high-performance computing (HPC), and a lot of HPC techniques are proving useful to machine learning. “Most of the needs we see in HPC are being copied in machine learning,” Shainer says.
Nvidia’s GPU Helpers
Sharp is relatively new — Shainer wrote an introductory blog entry about it in late 2015 — but Mellanox and other vendors have been implementing other GPU-helping technologies that chipmaker Nvidia has championed over the past few years.
Remote direct memory access (RDMA), for instance, lets the network move data directly to an application, bypassing the CPU. It’s useful for avoiding some of the chatter required by the TCP protocol
Similar problems crop up with GPUs, where data going from one GPU to another has to pass through a CPU. To skip that step and connect the GPUs directly, Nvidia offers a technology of its own called GPUDirect RDMA.
Nvidia has also created rCUDA as a way to extend the CUDA parallel-processing engine that runs on GPUs. It’s a way to forward commands to a GPU that resides in another system. This way, multiple GPUs can combine forces on one problem; it eliminates the physical limitations of having to build a server with that many processors.
RCUDA would also let a normal CPU tap the processing power of GPUs elsewhere. “You can transform the GPU to become a service,” Shainer says.