Consensus is building that network functions virtualization (NFV) will be a popular model to run services and applications in the the largest networks in the world, including the global service provider networks. But virtualization can carry a performance penalty. That may be why the industry is working through the best options to deploy NFV while not sacrificing performance, and the best route to provide carrier-class service with commodity hardware.
As we discussed in our first article on this topic, NFV evolved out of the cloud world, where “webscale” providers such as Amazon and Google took commodity hardware, or commercial off-the-shelf (COTS) hardware components — and built their own computing and networking platforms by developing custom software to control them. This approach is migrating to the service provider market, but there is the growing sense that COTS may not be sufficient – some network operators may want to boost the performance of their hardware for demanding applications such as mobile traffic.
Seeking NFV Performance & Programmability
The main challenge for performance of NFV is that if you are deploying NFV infrastructure (NFVI) on COTS hardware, the virtualization layer is likely to add a performance penalty. Many service provider applications will need to pass through multiple virtual machines (VMs) and even different servers to perform all the functions required for the services (think service-chaining, normal gateway processing, etc). Think of an NFV cloud-type platform as a series of gateways or tolls, where at each junctions with a server VM or hypervisor, a toll might be collected by processing packets, therefore slowing down the flow of traffic.
Emerging technologies focus on tuning the performance by adapting the virtualization layer to offload certain calculations and CPU functions on the servers – or to provide faster ways for the servers to access the NICs and bypass virtualization bottlenecks. The techniques to adjust for this falls into two general approaches – hardware-centric or software-centric. Or, the approaches can be combined.
One of the more fundamental methods is PCI pass-through, essentially dedicating a NIC port to a VM. PCI pass-through allows a physical NIC to be assigned to a specific virtual machine in an NFV service – creating another fast lane and bypassing the complexity the hypervisor playing traffic cop and dealing for the different packets coming in. However, this has the limitation of allowing only as many VMs as there are NIC ports to benefit from the performance boost, creating additional port-management complexity on top of VM placement issues (i.e. where’s the best server/rack to put a newly created VM).
A step up to this is to use Single root I/O virtualization (SR-IOV), when the NIC supports it. This is an extension to the PCI Express (PCIe) specification for network interface cards (NICs) and adapters. SR-IOV allows an adapter or NIC to present a physical port as multiple virtual ports (VFs or virtual functions), each of which can be tied to a VM. This method utilizes the NIC hardware to offload the traffic-cop functions from the server CPU, saving cycles and improving latency and overall performance. Even with this method, there are still limits—while theoretical limits are much higher, practically, most NICs can effectively support about 8-16 VFs for 1GbE ports and around 40-64 for 10GbE ports before seeing performance degradation.
Another hardware approach that has much larger scale is the use of specialized NICs that offload the virtual switch functions entirely from the server CPU. These cards tend to have NPUs or FPGAs that run as a high-speed switch with programmable capabilities. While more expensive than regular NICs that are used in PCI passthrough or SR-IOV modes, they scale up better and provide much higher performance. In many deployments that are I/O centric, filling a 40GbE pipe requires up to around 28M packets per second, which taxes server CPUs and makes the argument for using these specialized NICs which free up those expensive server CPU cores and returns them to applications processing.
And yet there are other approaches. Another tack would be using ToR (Top of Rack) switches to handle the packet processing on behalf of the server CPU, but there are issues with latency and round-trip times to the ToR switch plus the management of partitioning the switch across the numerous (24, 48) servers that are tied to the switch.
What about software? One of the pitfalls of NFV is that using virtual switches such as Open vSwitch (OVS) come with their own performance hits. Special software libraries can be used to tune the performance of communication between the network processors in a NIC and the server CPU. One of these approaches uses Data Plane Development Kit (DPDK): This is a set of libraries and network interface card (NIC) drivers that can speed up packet processing by bypassing the need to access the server OS kernel. Other methods include using smarter methods to manage incoming packet cores, such as VPP from the FD.IO project (originally Cisco IP that was donated to open-source).
For service providers looking into this, the biggest consideration is the balance between multiple competing priorities: price/performance and flexibility. Use of HW-centric methods require understanding where certain workloads are placed (unless you make sure you have enough ports for PCI-passthrough or specialized NICs everywhere in the data center) and management of specialized resources. However, these can provide better price/performance overall for high I/O needs.
Another consideration is security. In many virtualized systems, careful consideration must be given to the security implications of which technique you are using. For example, in an PCI pass-through situation you are giving the virtual machine direct access to the NIC hardware, which is a no-no in some security camps.
One point to note on specialized clusters in NFV datacenters—while some operators abhor having specialized groups of I/O-centric machines, when we look at many public clouds (AWS, Azure), they provide many different flavors of VMs for rent—some with large CPU cores, some with large mem and others with faster disks and faster I/O. It stands to reason that perhaps for NFV deployments, a similar approach may have to be taken and orchestration simply needs to be more advanced.
Regardless, it’s clear that with a wide range of NFV performance improvement techniques are evolving, so service providers are going to have to weigh the pros and cons of each and decide which is best for their particular infrastructure and service profile – or figure out how to combine some of them for the best possible solution. Many of these approaches will find their way into operator installations, depending on the network topology and needs of the applications.