When telecom operators strive for high availability, they usually turn to application-level redundancy practices such as load balancing, check pointing, and journaling. A new white paper from Wind River argues that such basic approaches may benefit simple, stateless applications like web servers, but alone they’re not enough for today’s modern, stateful services.
Detailed service level agreements (SLAs) make it imperative for telecom operators to provide their customers with services that perform as expected the instant the service is requested. “NFV: The Myth of Application-Level Availability,” serves as a primer for a best-practices solution for high availability (HA) for telecom operators. The paper looks at the availability challenges telecom operators face, outlines how to prevent disruptions, and lays out a layered approach to achieving the service levels customers demand.
Network service providers face numerous problems and failures ranging from single server and network node failures all the way through malicious network attacks and network congestion or overload. Application-level solutions provide a degree of protection from certain failure scenarios, Wind River says, but telecom operators need to use more comprehensive frameworks to meet the high-availability expectations of the competitive, SLA-driven telecommunications market.
For example, Wind River cites the traditional application-level “active/standby” model, where two instances of an application (or VNF, in the case of NFV) exist: One that actively provides service, and one that does not, but can do so rapidly should the active, serving instance fail. The white paper points out that if both instances are on the same physical server and the server suffers an outage, both instances of the application are instantly lost, and a customer-visible outage will almost certainly follow.
Since no single “one-size-fits-all” approach can easily address all disruptive events, Wind River says any NFV solution should reinforce application-layer protections with protections in other layers. The company’s Titanium Server NFVI software platform does just this by implementing a proactive fault detection and recovery system that uses policies and metadata to enforce application deployment models.
Because the Titanium Server constantly monitors system operations at the NFVI, VIM, VNF, and VNFM levels, it can autonomously react to failure detection. It independently takes self- healing and service preservation actions, enabling services deploying on Titanium Server to achieve five nines (99 .999%) availability, or no more than five minutes downtime per year (planned plus unplanned).
Read the white paper to learn more.