Thanks to all who joined us for the ACG Research Report Webinar, Quantifying the Economic Advantages of Open Architecture NFV Designs, where we heard one of the industry’s leading analysts share results from one of the first studies that quantifies the benefits and impact from one of the largest Telco infrastructure modernization project in the world. After the webinar, we took questions from the audience but unfortunately ran out of time before we could answer everybody’s questions. You can read the full Q&A below.
Do smaller or larger deployments drastically change the economics and results (i.e. where the conditions optimized)?
When using a design framework like the one employed in this SP the results do not change drastically, because the unit costs from the capex perspective, and the workflow elements from the Opex point of view don’t shift proportionally. In this SP’s case, the NFV deployment has been made using a highly modular and ‘disaggregated’ infrastructure design, meaning, servers, storage, networking and security elements are individual components of the design and are optimized for their own functional requirements (which is the essential premise of the NFV reference architecture that the industry has developed – cf ETSI NFV Reference Architecture as the prime example). What we did NOT do in this study was look at the impact of ‘hyper-converged’ infrastructures on the economics of the deployment. A hyper-converged infrastructure is one that employs optimizations in the way compute, storage and networking functionality are provided in the context of a more streamlined rack architectural framework. It is likely that hyper-converged architectures, potentially targeted at installations of smaller overall scope than the 12-rack design employed in this SP’s central operating sites, might make even more improvements in the economics of the deployments. But that is not an analysis that we performed in this case.
Tell us more about the DNA and cultural transformation this entailed for the SP.
One piece of transformation was in the composition of the operations teams involved in deploying the solution. The NFV configurations are being deployed into active, large scale service delivery networks that have, historically, employed highly segmented team structures in which individual groups of specialists have been responsible for the operation of the components they are focused on. For example, IP routing engineers focused on that part of the deployment, while specialty network function nodes such as mobile packet gateways and firewalls, were supported by engineers skilled in their operation. Responsibilities were assigned in a highly compartmentalized way, with strict and formal handoffs between the teams with respect to the status and health of the deployment.
In the virtualized service delivery environment, the degrees of interdependence between the people skilled in different parts of the solution are increased, and extent of separation once employed as one ‘box’ talked to another one becomes untenable as the software infrastructure underpinning *all* of the VNFs (routers, firewalls, packet gateways, and others) becomes shared (comprised of the hypervisor and underpinning virtual networking solution) and the control functions needed to manage the different infrastructure elements become more intertwined. The ‘closeness’ of the handshakes in between the functions increases substantially, and the teammates have to work with one another that much more closely to ensure success in the deployments than they have been accustomed to in the past. The operations teams have become much more multi-disciplinary, versatile, and converged.
Is this an easy process? NO. It takes intent, clear understanding of the components involved and the endgames, and requires a *willingness* on the part of the participants to bring a new approach into play than has been the basis of their jobs before. In the SP’s case who was in focus in our study, this willingness began at the top and was also embraced by participants who volunteered to be part of the transition. There is an inherent degree of trail-blazing involved that participants have to ‘opt in on’, along with sound, prudent judgment as far as what the determinants of success will be as the evolutions progress. This said, as participants from the multi-disciplinary backgrounds coalesced into a newly structured operational framework – able to look at server health as an inherent part of network operations in parallel with the software running the firewalls and the gateways as VNF workloads within the NFVI – the new perspective began to take hold and a new model for scaling and deployment took form.
Another area in which a new point of view emerged is in the area of automated service creation, testing and deployment. The NFV platform the operator has deployed exists at the ‘runtime end’ of a cloud-based federation of development and test environments at the operator. In order to make this model work for the multiple business unit ‘tenants’ of the ‘PaaS-based’ development environment, a series of adjustments to the working models and the workflows employed in the Dev à Ops activity flow needed to be implemented. After sorting out the intermediate testing elements of those adjustments (which involved a combination of disciplines to be brought together from a system testing perspective that are similar to those brought together in the operations teams, and previously described) the biggest technology and procedural changes came along with adopting the model-based, template-driven service deployment and activation procedures involved in applying Ansible playbooks and OpenStack Heat templates to the provisioning cycles for the NFV solution elements. These not only require the multi-disciplinary participation described previously, but also familiarization and adoption of a new set of workflows and validation procedures that had been present in the more segmented and silo’d activation models in the operator’s prior mode of operation. All this being the case – this was (and is) an additional area of cultural adaptation that is inherent in getting the more automated, software-driven processes of NFV deployments to take hold.
What’s your view on the current pace of development and deployment of NFV-based platforms by telecom operators?
It’s generally cautious, but willful and progressing broadly and at an accelerating pace. Deployments are tending to be started in either defined use case categories that can be tested in a relatively contained manner (such as focusing on a new application deployment or service offering) or in given portions of a deployment architecture that can be monitored closely and pulled back if performing insufficiently. Given the scale and scope of change that’s involved, this ‘focused insertion’ overall pattern is not very surprising. Recall perhaps the introduction of MPLS into service provider networks in the first decade of the 2000s. This was not done on a ‘wholesale swap-out’ basis. It was done incrementally, much like the process I just described for NFV. Given the importance to users and operators of the functions that are being deployed, and the value of the apps they are using as the ‘count on’ the availability of the system platforms that enable them, it’s not surprising that there is a process of ‘burning in’ the new technology designs and having them prove their viability. This takes a different pace and shape in every service provider’s operation. Some are more ‘ready and game’ than others. Some can ‘afford’ to experiment more than others. But in general, if one were to look at it in overall market adoption terms, the pace of uptake could be described as starting to transition into the ‘early adopter’ category of deployers, having been in the ‘innovators’ category for the past couple of years (these terms being based on the adoption models described by Geoffrey Moore in his seminal work on Crossing the Chasm).
What impact has the emphasis on open source had in the pace of development?
In an industry which relies on interoperability as broadly as communications service providers do (e.g. it’s imperative that networks be able to ‘hand off’ to each other cleanly, that functions be able to ‘complete’ cleanly across multiple participating hops, etc.) the general effect of the surge in open source developments in this community has been to accelerate the willingness and the ability to change, and to bring new capabilities to market sooner, in an interoperable way. The effects, for example, of open source on virtual system infrastructures such as hypervisors, virtual machines, and containers has been to *force* implementers to consider how their ‘innovations’ will ‘fit’ in the operating frameworks their solutions are intended to ‘plug into’. When this is being done into a broadly Linux-based framework, for example, across multiple suppliers of Linux-based underlays, the task is made more straightforward and simpler than if it is being done across a substrate of 4 or 5 significantly different OS implementations, whose capabilities may not, in fact, converge over time.
Using that example as a reference, the ability to have VNFs integrate into VNFIs more efficiently is accelerated by having an open framework for evaluating their success (or their failure) in doing so. The contributions of ETSI in framing the problems and the solutions, and of the OPNFV project in bringing open source communities together to test their readiness for open integration into deployable virtual environments, are having a similar effect on getting NFV ready to deploy.
When looked at in the context of an individual solution’s development – such as a new enhancement or widget to be integrated into a specific system environment – one might think that the ‘messiness’ of the ‘elapsed time’ of getting functions into open source distributions is an impediment to progress. However, when looked at in the context of the broader economic interests that are at stake in having interoperable VNFs deployable into network operations globally, as the underpinnings of the ongoing evolution of network services, it’s possible to see the drastic acceleration that broadly supported open source communities are bringing to the process, versus the overall market pace that significantly narrower communities of contribution (as are in play within individual companies’ developments) can supply. This is not at all to say that the innovations individual companies bring aren’t important or valuable or relevant. It’s to say that, in certain categories of broadly applicable infrastructure such as an NFVI or a VIM, the benefits of open source contributions tend to have the net effect of substantially faster and broader adoption in a deployment environment like the communications infrastructures the operators supply, versus completely segregated development cycles applied to uniquely designed platforms that then, somehow, have to be made to work with each other to build a functioning solution.
What have you seen as some of the bigger challenges to this point that have perhaps slowed both the development of NFV platforms by the vendor community and the adoption by telecom operators?
The challenge of assembling multi-disciplinary design, test and deployment/operations teams is one. That takes time (though it’s certainly do-able). A second is developing familiarity with the operating methods of new virtual infrastructures: managing servers, storage, hypervisors and VNF ‘application workloads’ is different on many levels for a ‘network’ operations team than what they’ve been familiar with before. The challenges can be overcome with focus, insightful team building, and diligence in developing blended knowledge bases + evolved processes, but it takes time to achieve. A third is the natural ‘cycle of viability’ in supporting important and demanding functions (such as firewalls, network gateways, and deep packet inspection engines) in a new architectural framework (general purpose computing and OSs) versus frameworks in which they’ve been run before. This isn’t to say it’s not happening – it is. It is to say that the cycle of getting the implementations to a place where they deliver predictable and reliable performance for their tasks involves a few iterations before the new implementations can be thought of as ‘ironed out’ and understood. It is still in the early phases of adoption for the new framework. A final challenge to mention is the relative lack of monitoring and visibility tools that go to the level of functionality required to master the new functions’ deployments. They are being actively developed and incrementally deployed – but that is another area that has contributed to relatively cautious adoption cycles.
Can virtual BIG-IP instances auto-scale up and down based on traffic?
Thresholds may be established and when reached, automatically creates a new instance of the VNF. When the scale down threshold is reached the NVF will remove itself from the pool of resources.
Is there a use case to deploy Telco cloud on Hyper Converged Infrastructure?
Yes. One in which the required capacity of the deployment will match the capabilities of the HCI.
How about maintenance and support cost related with OpenStack infrastructure? Is someone needed to deal, maintain or customize and maintain it, which may involve getting all these together in the right way and maintaining over the period including security issues with open soft?
There are multiple ways to acquire + deploy an OpenStack solution. At the extreme end of the spectrum an operator can download and strive to employ the OpenStack software directly from the OpenStack Foundation’s release repositories. That, of necessity, requires the largest amount of attention by the operator to ensure readiness and viability to install in a given environment. At another level an operator can work with a supplier of an OpenStack distribution as part of a broader solution offering in which a software supplier has taken the time to test and integrate the OpenStack software with additional modules such as underlying operating system and adjacent management and operations support tools that add value and functionality to the base OpenStack distribution. In this latter mode the support costs become partially a function of the support costs paid to the solution supplier (which are generally substantially less than the personnel costs required to perform the support functions on one’s own), as well as the costs of engineering and operations personnel that are associated with defining, installing and running the new software infrastructure. These latter costs are real, but typically far less than the collection of costs required to deploy an infrastructure designed on the legacy ‘silo mode’ of operation in which every platform requires its own dedicated skill set and support team to run. The considerations around customizing – or configuring – the software for the operator’s own environment are not materially different from any other form of configuration – one has to design and configure each infrastructure platform that one decides to employ – the broader simplifications are more impactful, in general, than those particular concerns.
As far as security of the OpenStack software is concerned, there are multiple levels of consideration with respect to security of an operating environment one has chosen. At the ‘normal’ level of ensuring authenticated access of processes and users into the software environment being deployed, OpenStack’s Keystone and related monitoring functions provide a solid level of access control throughout its collection of modules. At other levels – such as preventing back door access to modules and ‘hacking’ a deployment environment – the protections supplied by the individual solution curators (OpenStack solution suppliers) are the ones to explore for ensuring the appropriate level. Leading OpenStack solution suppliers tend to employ secure OS and infrastructure capabilities that provide additional protections for underlying processes that ‘round out’ the levels of protection supplied. That is a matter, however, of exploration with each solution supplier as to the specific protections supplied.
Where does SRIOV play a role – big switch networking?
SRIOV is one approach to providing high network bandwidth for the NFV VMs. Typically, any vendor that provides the networking solution on the hypervisor in OpenStack environment is responsible for SRIOV integration. To that end, Big Cloud Fabric provides SRIOV support for OpenStack workflow. The key differentiator for BCF SRIOV workflow is that we support active-active bonding on the SRIOV interfaces—allowing for efficient use of uplink bandwidth and providing resiliency during fabric upgrade, while still retaining the simplicity of using the high-speed interfaces from the VM perspective.
Can you share some color on how special capabilities like support for SR-IOV for high throughput VNFs, or other specialized functions such as GTP hashing for mobile network services have been brought into the solution efficiently from the underlay SDN platform’s point of view?
For SRIOV, please refer to Question #9.
ASICs today provide support for LAG and ECMP selection using GTP trunk ID. Therefore, it is possible get high-throughput traffic through leaf-spine fabric with multiple 10G to the server by taking advantage hashing based on GTP tunnel ID. The return traffic from the VNF VM is typically hashed by the application itself. If a networking vendor provides a specialized switching functionality on the hypervisor (say a custom DPDK based vSwitch), then GTP hashing can be done there as well.
Can you comment on the extent to which ‘standardized’ automation templates (such as HEAT templates in the OpenStack distributions) have been employed by this operator, and how effective they have been in helping activate new services and elements?
As an example, F5 has released a number of OpenStack components, including a Heat plug-in library that introduces F5 objects into the OpenStack infrastructure; and Heat templates that can easily orchestrate F5 services in OpenStack. Taking advantage of these F5 offerings enables customers to extend the F5® TMOS® architecture into OpenStack clouds and NFV infrastructure. The F5 solution dynamically inserts critical and consistent L4 to L7 services into the OpenStack cloud, helping service providers ensure application availability, performance, and security while improving operational efficiency.