As an operator, you know you can reduce both capex and opex with network functions virtualization (NFV ). Therefore you want to implement NFV quickly in order not to miss the bandwagon and delay the big opportunities NFV promises. But have you taken time to consider all the pros and cons of the commercial solutions being offered to you?
ETSI has done a great job, so far, in laying the building blocks for NFV. But it has left it to other standards bodies to define the low-level details and protocols of those building blocks. Although there are a number of organizations working to define those standards, the fact is that NFV is still a work in progress.
No matter how much vendors may claim, NFV is still in its infancy. Yet, you need a carrier-grade telco system with carrier-grade performance, carrier-grade scalability and reliability (99.999 percent), and carrier-grade manageability and security.
This article describes some of the carrier-grade features that are still missing in the commercial NFV solutions. It will equip you, as an operator, with some questions that you can ask your vendors and, I hope, help you evaluate and benchmark the commercial NFV solutions. At the very least, it should enlighten you about the risks associated with being an early adopter of NFV.
Let’s consider the gaps in the current NFV systems and explore what we mean by the “non-ideal” NFV system.
OpenStack Is Not Fully Carrier-Grade
OpenStack has emerged as a leading virtual infrastructure manager (VIM) – therefore I am focusing on OpenStack here.
For reference, see below the position of OpenStack in the NFV blocks defined by ETSI . In fact, the whole NFV infrastructure and its management layer, as encircled below, are under active standardization focus. (For example, the OPNFV body is actively working on standards in this area.)
Unfortunately, OpenStack is inherently not carrier-grade. It was designed primarily for clouds and not for carriers. For example, to detect a failed virtual machine (VM), it takes more than a minute in standard OpenStack implementation. This renders OpenStack unsuitable for telco applications.
The OpenStack Foundation has embarked on efforts to make OpenStack more robust, reliable, and manageable. The recent release of the Kilo version is a step toward making OpenStack more carrier-grade. However, there is still a lot of development needed, which the Foundation is working on.
Nevertheless, vendors, in order to roll out commercial products, have started doing their own enhancements to OpenStack to make it more telco-suitable. So until you have the standard carrier-grade version of OpenStack, look carefully at what the vendor is providing.
Some questions that might help you benchmark the carrier-grade features of a vendor’s OpenStack NFV:
- Is your system without any single point of failure?
- What is switchover time if a compute node / virtual machine / virtual link fails?
- If the switchover time is more than 50 milliseconds, how does it affect latency-prone services like voice and video?
- How many VMs can you run at one time (the bigger the better – aim for thousands)?
- How are you hardening the security of OpenStack?
- What enhancements have you made to OpenStack, and what are you doing to standardize them in standards bodies?
There Is No Standard MANO Model
Management and orchestration (MANO) is your most precious investment in the whole NFV process. Virtual network functions (VNFs) may come and go, but you may be stuck with your MANO if you choose the wrong management layer.
As mentioned earlier, ETSI has defined the high-level framework for NFV, but the actual protocols needed for the blocks to interwork (see the following diagram) have been left to the standards bodies.
In the absence of clear MANO standards, you will see a lot of proprietary implementations and interpretations of MANO – with “open APIs,” as they call them.
There are a few things you can do to mitigate your risks here:
- Make sure that the management layer is as modular as possible and follows MANO blocks instead of an all-in-one approach where the vendor combines the blocks (for example, orchestration, VIM, VNFM as one block). Present a tightly integrated MANO, which means it will be difficult if not impossible to integrate it with third-party MANO blocks.
- Take a note of the reference points in the MANO block in the above diagram. Make sure that the vendor has developed and demonstrated open reference points for its MANO to interwork with other third-party MANO modules and third-party VNFs/NFVIs.
- Ensure that the vendor has developed and demonstrated an ecosystem of its MANO working with third-party NFV components.
NFV Is Weak in Fault/Performance Management
Welcome to the world of NFV. You are dealing with COTS servers, which are prone to faults. This may impair the VNFs running on them. There may also be software faults related to OpenStack, hypervisor, or VM.
The essence of NFV is being open and running cross-vendor systems under one umbrella. With heterogeneous systems, thus, fault management and performance management across multiple layers and multiple vendors become very important.
This is again a pre-standard field. ETSI MANO has described some high-level requirements for fault management, but that’s it. OPNFV is working on standardizing this area for the NFV infrastructure, but it’s at a very early stage.
Take the example of service chaining in which an operator uses a firewall from one vendor, Web server from another, and video server from a third. A service provider that is facing latency or jitter would need to question which vendor’s component is contributing to this performance degradation.
In the absence of clear standards on how faults are identified, communicated, and reported, how do we make sure that faults are isolated clearly without any ambiguity and reported so that MANO can take an integrated action? Today there are no standards for fault correlation.
I don’t think we yet have clear answers for issues like these. They remain gray areas, and standardization will take some time.
Today’s non-ideal system means some compromises may involve the risks identified above. You need to make sure that you and your vendor are covering these risks with a clear plan for the future, once standards are ready.
After all, you would not like to be surprised later and locked in to a vendor with your non-ideal NFV once the standards are ready.
It’s your turn to tell me if I missed anything here. What is your view as a vendor or user? If you are a vendor, please tell me if you are doing anything to mitigate any of the risks mentioned.
If you are an operator, share your agreement or otherwise, or anything else you would like to add here.