SDxCentral talked to Charlie Ashton, director of business development at Wind River, about how carrier-grade NFV is becoming a reality for the service provider market. Ashton leads Wind River initiatives in networking and telecommunications markets and has held leadership roles in both engineering and marketing at software, semiconductor, and systems companies.
SDxCentral: With the rise of NFV in the service provider market, Wind River has made significant investments in the technology and the NFV ecosystem. Can you tell us about Wind River’s NFV initiatives?
Ashton: Our initiatives are based around our NFV infrastructure (NFVI) software platform called Titanium Server. This platform runs virtual network functions (VNFs) with the same level of reliability that telecom networks have typically delivered using traditional physical equipment. Service providers now can maintain the expected level of uptime for the customer services they deliver as they deploy NFV, despite the additional risk factors introduced by the complexity of NFV architectures.
To ensure full compatibility and interoperability with other hardware and software elements, we launched the Titanium Cloud ecosystem program. We work closely with our partners in this program to ensure their solutions are validated to work correctly and optimally with Titanium Server. Our current partners include Artesyn, ASTRI, Brocade, ConteXtream, Genband, HP, Intel Security, Ixia, Kontron, Metaswitch, Nakina, and Overture.
What do you think is most lacking in NFV infrastructure today?
Ashton: The most critical feature missing from NFVI platforms based on enterprise-class software designed for IT applications is the carrier-grade reliability necessary to ensure service uptime. Enterprise-class platforms only achieve three-nines (99.9 percent) uptime at best, while telecom infrastructure requires six-nines (99.9999 percent). Wind River fills that gap with Titanium Server, which is guaranteed to deliver six-nines uptime.
Is five-nines or even six-nines really feasible with software? Can we expect software-based systems to achieve the carrier-grade performance and availability as hardware systems?
Ashton: Absolutely. We’ve demonstrated six-nines and had it confirmed by an independent report called “Titanium Server: Reliability, Availability, Maintainability (RAM) Modeling Analysis.”
What are the key elements that have to be present in any carrier-grade NFVI?
Ashton: Any carrier-grade NFV must provide four things at carrier-grade level: network availability, network security, performance, and network management.
To guarantee network availability for virtualized applications, you need an optimized hypervisor that minimizes the duration of outages during the live migration of virtual machines (VMs). The management software must be able to detect failed controllers, hosts, or VMs very quickly and implement hot data synchronization so no calls are dropped or data lost when failovers occur. The system must automatically act to recover failed components and restore sparing capability if that has been degraded.
Network security requirements present major challenges for telecom infrastructure. Carrier-grade security must be designed in from the start as a set of coordinated, fully embedded features. These features include full protection for the program store and hypervisor; AAA (authentication, authorization and accounting) security for the configuration and control point; rate limiting, overload and denial-of-service (DoS) protection to secure critical network and inter-VM connectivity; encryption and localization of tenant data; secure, isolated VM networks; secure password management; and the prevention of OpenStack component spoofing.
In terms of software systems’ performance, what is realistically achievable latency or fault-detection?
Ashton: Any carrier-grade network has stringent performance requirements in terms of both throughput and latency. The host vSwitch must deliver high bandwidth to the guest VMs over secure tunnels. At the same time, the processor resources used by the vSwitch must be minimized because service providers derive revenue from resources used to run services and applications, not those consumed by switching.
In terms of latency constraints, the platform must ensure a deterministic interrupt latency of 10 microseconds or less in order for virtualization to be feasible for the most demanding CPE and access functions. Finally, live migration of VMs must occur with an outage time less than 150ms.
What are the key capabilities needed for network management?
Ashton: To eliminate the need for planned maintenance downtime windows, the system must support hitless software upgrades and hitless patches, and the backup and recovery system must be fully integrated with the platform software. Support must be implemented for northbound APIs that interface the infrastructure platform to the OSS/BSS and NFV orchestration software, including SNMP, Netconf, XML, REST APIs, OpenStack plug-ins, and ACPI.
How has NFV server performance evolved? What is the state-of-the-art that Wind River has seen on commercial off-the-shelf servers (COTS)?
Ashton: Let’s look at a specific use case to illustrate the performance of the Accelerated vSwitch (AVS) in Titanium Server, running on COTS hardware.
We’ll assume we need to instantiate a function such as a media gateway as a VNF and that it requires a bandwidth of 2 million packets per second (2 Mpps) from the vSwitch. For simplicity we’ll instantiate a single VM, running this VNF on each processor core. As the reference platform for our analysis, we’ll use a dual-socket Intel Xeon Processor E5-2600 series platform (“Ivy Bridge”) running at 2.9 GHz, with a total of 24 cores available across the two sockets.
All our performance measurements will be based on bidirectional network traffic running from the network interface card (NIC) to the vSwitch, through a VM and back through the vSwitch to the NIC. This represents a real-world NFV configuration, rather than a simplified configuration in which traffic runs only from the NIC to the vSwitch and back to the NIC, bypassing the VM so that no useful work in performed.
In the first scenario, we use Open vSwitch (OVS) to switch the traffic to the VMs on the platform. Measurements show that each core running OVS can switch approximately 0.3 Mpps of traffic to a VM (64-byte packets). The optimum configuration for our 24-core platform will be to use 20 cores for the vSwitch, delivering a total of 6 Mpps of traffic. This traffic will be consumed by three cores running VMs, and one core will be unused because OVS can’t deliver the bandwidth required to run VMs on more than three cores.
When we replace OVS with Titanium Server’s AVS, we can now switch 12 Mpps per core, again assuming 64-byte packets. So our 24-core platform can be configured with four cores running the vSwitch. These deliver a total of 40 Mpps to exactly meet the bandwidth requirements of 20 VMs running on the remaining 20 cores. Our resource utilization is now 20 VMs per blade.
What does that mean from a business perspective?
Ashton: Increasing the number of VMs per blade by a factor of 6.7 (20 divided by 3) allows us to serve the same number of subscribers using only 15 percent as many blades as when OVS was used. Another way to look at it is we can now serve 6.7 times as many subscribers using the same server rack. In either case, this represents a very significant reduction in opex, and it can be achieved without changing the VNFs themselves.
Will we still achieve the cost savings with COTS if the software has to evolve to be sophisticated enough to achieve all these capabilities you are describing in an NFV server? Are we just trading one cost for another?
Ashton: The most critical software elements in any NFVI platform are 1) the operating system, 2) the hypervisor, 3) the virtual switch, 4) the platform management, and 5) the telecom middleware. The overhead for running the carrier-grade management and telco middleware functions is minimal, especially on a high-performance server with a large number of cores, which is the typical hardware platform for NFV deployments.
Wind River is a subsidiary of Intel. What kind of synergies are there between the two companies?
Ashton: The synergies are very strong. Because we are a subsidiary, our engineers are able to work closely with Intel’s to not only understand how we can best leverage the micro-architecture and features of the latest server-class platforms, but also to provide their experts with feedback on the requirements of carrier-grade software. It’s a very positive situation for our mutual customers. They benefit from best-in-class telco software running on industry-leading server platforms.
We also have a strong collaboration in terms of code that we submit to the various open-source communities, which is a key part of our strategy to ensure that key functions required for NFV are implemented in open source as soon as it makes sense.
What is still missing in the NFV ecosystem? How do you see these holes getting plugged in the next six to 12 months?
Ashton: Interoperability is a challenge. Phase 1 of the ETSI NFV Industry Specification Group (ISG) work defined the software architecture for NFV, subject to some outstanding questions around management and orchestration (MANO). In phase 2, we expect consensus on the architecture, including MANO, as well as the right set of open standards so that vendors can develop products with the confidence that they will achieve interoperability.
What other developments can we expect from Wind River on the NFV front?
Ashton: Over the next few months, you’ll see some significant expansion of the Titanium Cloud partner ecosystem as we continue to validate that solutions from industry-leading NFV companies run correctly and optimally on Titanium Server.
On the product side, we’ll be adding additional features and capabilities to Titanium Server this year. We’re looking forward to seeing the first fruits of our collaboration with HP that was announced in November. HP will incorporate carrier-grade technology from Titanium Server into their HP Helion OpenStack solutions for NFV.
Don’t forget to check out our Mobile World Congress coverage page for our full coverage of the week’s events