This post will explain how increasing the performance of the virtual switch in an NFV platform causes very significant improvements in service provider opex. We’ll show an example that illustrates a 6.7x increase in the number of subscribers supported per server, while explaining how greater improvements are possible in other use cases.
When the concepts for network functions virtualization (NFV) were first outlined a couple of years ago, service providers generally cited two main business objectives as their motivation for moving from proprietary, fixed-function networking equipment to virtualized applications.
The first goal was to accelerate the deployment of new, value-added services as a way to raise the average revenue per user (ARPU) and drive top-line revenue growth. Network services would be instantiated as virtualized software deployed as needed on standard server platforms, rather than as fixed-function equipment. This would allow service providers to quickly roll-out new services to a target group of subscribers on a trial basis. If the trial was successful, the deployment could rapidly be scaled-up and the service introduced to a wider customer base. If the results of the trial were disappointing, the new service could be discontinued with no need to worry about what to do with custom, purpose-built equipment that had been purchased and deployed specifically to support that one application.
The second goal, and the one that has received most attention so far, was to reduce operating expenses (opex) through improved resource utilization as well as through increased automation in the management of the network. Much of the detailed work of the ETSI Industry Specification Group (ISG) has been focused on these two aspects of opex reduction and they’re being analyzed in many of the proof-of-concepts (PoCs) that are currently underway. But there’s one specific element of the NFV architecture that has a major effect on opex and seems to have received less industry attention, which is the virtual switch, or vSwitch.
As part of the NFV infrastructure platform (“NFVI” in ETSI terminology), the vSwitch is responsible for switching network traffic between the core network and the virtualized applications or virtual network functions (VNFs) that are running in virtual machines (VMs). The VMs execute under the control of a hypervisor, such as KVM, and the VNF management is typically performed by OpenStack (which needs to be hardened to provide carrier grade reliability, as we discussed in another post).
The vSwitch runs on the same server platform as the VNFs. Obviously, processor cores that are required for running the vSwitch are not available for running VNFs and this can have a significant effect on the number of subscribers that can be supported on a single server blade. This in turn impacts the overall operational cost-per-subscriber and has a major influence on the opex improvements that can be achieved by a move to NFV.
Let’s look at an example to illustrate this concept.
To keep the analysis simple, assume we’ll need to instantiate a function such as a media gateway as a VNF and that it requires a bandwidth of 2 million packets per second from the vSwitch. For a further level of simplification, assume we’re going to instantiate a single VM, running this VNF, on each processor core. So we need to calculate how many VMs can actually instantiate on the server blade, given that some of the available cores will be required for the vSwitch function.
As the reference platform for analysis, we’ll use a dual-socket Intel® Xeon® Processor E5-2600 series platform (“Ivy Bridge”) running at 2.9 GHz, with a total of 24 cores available across the two sockets.
All the performance measurements will be based on bidirectional network traffic running from the network interface controller (NIC) to the vSwitch, through a virtual machine (VM) and back through the vSwitch to the NIC. This represents a real-world NFV configuration, rather than a simplified configuration in which traffic runs only from the NIC to the vSwitch and back to the NIC, bypassing the VM so that no useful work is performed.
In the first scenario, the Open vSwitch (OVS) software, originally developed for IT applications, will switch the traffic to the VMs on the platform. Measurements show that each core running OVS can switch approximately 0.3 million packets per second of traffic to a VM (64-byte packets). The optimum configuration for our 24-core platform will be to use 20 cores for the vSwitch, delivering a total of 6 million packets per second of traffic. This traffic will be consumed by 3 cores running VMs and one core will be unused. VMs can’t run on more than 3 cores because the OVS can’t deliver the bandwidth required. So the resource utilization is three VMs per blade.
What if the OVS was replaced with an accelerated vSwitch capable of higher performance — say, 12 million packets per second per core, again assuming 64-byte packets. So now the 24-core platform can be configured with four cores running the vSwitch. These deliver a total of 40 million packets per second to exactly meet the bandwidth requirements of 20 VMs running on the remaining 20 cores. The resource utilization is now 20 VMs per blade thanks to the use of vSwitch software optimized for NFV infrastructure.
From a business perspective, increasing the number of VMs per blade by a factor of 6.7 (20 divided by 3) allows us to serve the same number of customers using only 15 percent as many blades as when OVS was used, or to serve 6.7 times as many customers using the same server rack. In either case, this represents a very significant reduction in opex and it can be achieved with no changes required to the VNFs themselves.
Conversations with service providers confirm that accelerated vSwitch performance can be a significant contribution to their opex savings in applications where the VNFs need a significant amount of traffic.
Of course, the vSwitch not only has to have incredibly high switching performance to bring these opex benefits, it also needs to be able to migrate VMs lightning fast under failure conditions with minimum packet impact in order to achieve the 99.9999 percent availability that NFVI platforms require. The accelerated vSwitch in the Carrier Grade Communications Server provides all these capabilities.
The scenario above (2 million packets per second per VM, one VM per core, dual-socket platform) probably isn’t representative of the specific application needs. But it’s pretty straightforward to recalculate the savings for any given set of requirements. You just need to figure out the optimum balance of vSwitch cores and VM cores for the bandwidth needed, minimizing the number of unused cores.
Finally, it’s worth noting that this discussion focused only on traffic between the network and VMs. In service chaining applications, however, the bandwidth of east-west traffic between VMs is equally important to system-level performance. A similar analysis can easily be performed to show how increasing vSwitch performance for VM-to-VM traffic brings the same level of improvement in core utilization and customers-per-blade.