Does SDN Make My Network Management Look Fat?

Archived Content

The following content is from an older version of this website, and may not display correctly.

I’ve been in network management a long time. Long enough to know that it ain’t sexy, and likely never will be perceived as sexy. My only dream is to at least get it moved from a cost center to a profit center, but that’s hard to do when your budget is never enough, and always shrinking.

Then along comes SDN, with its proponents talking about commoditizing this and helping to innovate that. According to the ONF, "SDN adoption accelerates business innovation by allowing IT network operators to literally program—and reprogram—the network in real time to meet specific business needs and user requirements as they arise." More importantly, the execs are excited. I can tell by the creaking sounds that their wallets are opening. Carefully dodging the escaping bats, I rush to see if now, finally, I can at long last get someone, anyone, to pay attention to network management while there is something left to manage.

“Why haven’t you implemented SDN yet”, bellowed one of my bosses. “It’s supposed to make management easier!”

Say what? I’ve seen this and similar themes confound the potential benefits of SDN: “Centralized management and control of networking devices from multiple vendors", "SDN makes management simpler by providing a vendor-agnostic API to program the network”, “SDN enables virtualization to be used to reconfigure the network.” Yada-yada-yada. The problem is that SDN, like all of its predecessors, started with the strategy of “build it first, manage it later”. This means that SDN will be able to solve some simple management problems, but not because it was built for management.

With respect to “making management easier”, that’s a completely different story. The root of the problem is that networks are managed completely differently than the applications that run on them. One of the original premises of SDN was that innovation in network applications has been stifled because of the immense cost present in the special-purpose hardware and software in network devices. In order for network applications to be more like PC applications, they need a simple, common, hardware substrate (similar to the x86), so that using open source, innovation can use applications and infrastructure (e.g., the OS and mechanisms such as virtualization) to rejuvenate network applications. This sounds good, especially with a fifty year single malt and a nice Montecristo #2. The reality is that networks are, fundamentally, a system of systems. You simply cannot manage something this complex with a single box. You cannot take a single device, reconfigure it, and expect that operation to not affect other parts of the system. People really need to stop and (re)learn control theory – John Boyd’s OODA loop is a great place to start.

“Isn’t SDN supposed to move intelligence from all of those proprietary network device operating systems to a much simpler set of systems that can use commodity hardware and open source software? Where are my savings in CAPEX and OPEX? You software management guys are the Devil Incarnate!”

Sigh. It’s just like the Future Internet . There have been hundreds of conferences, thousands of conference and white papers, and still, not even one proposal to replace SNMP. Engineers love to tinker with new toys that provide new functions, whether it is a new virtualization technology, or new abstractions, such as those proposed by SDN. When it comes to management, however, no one cares, because it just ain’t sexy. But wait…are those abstractions and key ideas of SDN really new? Hmmmm…

One of SDN’s key abstractions is the idea of separating the data and control planes. This has been talked about for over 10 years (old references include, for example, automatically switched optical networks (ASON), GMPLS, and even MPLS; efforts focusing on the Future Internet include: Autonomic Network Architecture, CMU 4D Architecture, and two very interesting FP7 projects: SAIL and UniverSelf. In order to really achieve true separation, then everything should be separated – separate messages, separate ports, etc.; equally clearly, this is not how any network equipment vendor’s implementation really works today. But wait, you interrupt! “SDN has already logically separated the data and control planes!” “Not so fast”, I reply. The devil is in the details. For example, SNMP mixes data, control, and management data together in the same protocol over the same port. So do other protocols. In fact, this permeates the design of pretty much everything in networking as we know it today. SDN does not solve this problem. It is no closer to solving this problem than any previous solution; in fact, some previous work, which in fact proposed separate management protocols to talk to separate control plane protocols to talk to separate data plane protocols, were much further along in their thinking and implementation that any current implementation of SDN, and those reach back well over 4 years. The reason that this is important is that by separating the data and control planes, it is easier to address two different functions and make each more reliable and manageable. Separation enables control plane functions to be located in physically different hardware than data plane functions; this in turn enables both to match the implementation requirements to the best combination of hardware and software available. This would enable “pools” of resources to offer different types of controllers and forwarding elements – some smarter, some dumber, some simple building blocks in order for smart designers to build their own, application-specific controllers.

However, the real point is that this should not have anything to do with SDN. Until SDN architectures include the management plane, the entire discussion about separating the data and control planes is pointless. In my opinion, SDN should be about abstracting the network to enable the network to understand what the application requires of it. We are nowhere even close to achieving this vision; I only know of one company – perhaps two – that may be trying to do this.

“SDN is supposed to make your job easier. I don’t see why you can’t just use the latest virtualization tools like everyone else and program the network faster.”

This mixes together the worst parts of two concepts that are meant to help – virtualization and programmability – and the result is a disaster.

Most SDN players want to convince you that they can build network virtualization solutions. In order to do this, management is critical. In fact, two types of management are critical: management of virtual resources, and management of the virtualization process (this latter part everyone forgets, but I digress). I fail to see how this can be achieved using the current accepted definition of SDN, since the abstractions that are currently being used are too low level to be useful to manage customer requirements for any type of network. In addition, there is a distinct difference between networks that are already built and running (perhaps not happily, but running nevertheless) and new networks that have no dependencies. The reality, if SDN wants to tap into established markets, is hybrid networks. How does SDN then make management easier? My existing apps for the “legacy” network all use “established” practices, in the form of EMSs, NMSs, OSSs, and BSSs. I don’t see many Ss for SDN! In fact, I don’t see where SDN has even thought about how it would interface into any of these systems. Then, you have the methodologies – eTOM for the Carriers, and ITIL for the Enterprises. Again, no real thought, let alone progress, is visible (at least to me; I would love to be corrected). The fear is that when SDN is added, I now have more complexity to manage my network, and new “untried” tools to use in my NOC/data center.

“What’s that, you say, Google uses Openflow?” True, they have developed a custom solution for a specialized application using Openflow. Does your application fit those needs? Did that creaking wallet open enough for your small team of Ops personnel to hire programmers to build and operate custom applications to manage your network?

Another of SDN’s key abstractions is the idea of programmability. (Warning, Will Robinson, Danger! Rant up ahead!) What in the world do people mean by “programming the network”? This makes no sense! Do people really think of the network as one single entity, with no moving parts, that can be programmed like a little robot mouse to find a piece of cheese? The network is a set of complex, heterogeneous, ever-changing pieces that present a set of shared resources to a set of users and applications that typically have conflicting requirements. For example, consider a VoIP app and a streaming multimedia app. They have completely different resource requirements on the network. If they are both owned by the same user, then that user is asking for different things from the network. Now multiply that by some large number, and you have the basic problem in data center, carrier, cloud, or other types of networks. The average network managed object, at whatever abstraction level you want, is a shared resource that is being pulled on by many different clients that simultaneously have different requirements. How does a low-level programming model (i.e., the flow-table of a switch) make this any simpler? Put more bluntly, this is like using assembly language to solve a multi-object optimization problem. Even if it could be done, why would you want to do it this way? (Note: we haven’t even delved into the fun parts of how, when one part of the network is changed, other parts are affected; that is for another post.)

“Our customers want SDN”.

Good point! Who, exactly, is SDN’s customer? Yes, me, an “en-guh-neer”, going to the dark side (in my defense, they do have cookies), talking about the customer. Last time I talked to a customer (yesterday), they were talking about subjects like “optimizing revenue” and “tracking workloads” (both in a data center context). There is the ubiquitous “pro-active monitoring of SLAs” (i.e., try to do trending on the SLA to ensure that we catch problems and fix them before the SLA is violated). Another popular use case is “Given these network statistics, where can I place my new device (router/switch/server) so that it can do the most good?” I am mystified at how the current SDN approach is able to solve any of these problems. Simply put, mucking around in the data plane has nothing to do with generating business revenue.

“Ya know, for a Ph.D. type, you can be purty dense. Commoditization, move the intelligence out from proprietary systems into places where everyone can build cool apps, tie it together with an open Network Operating System, and boom, you’re done.”

OK. I must have lost my Ph.D. somewhere. Dang thing never would sit still in my pocket. Nevertheless, in reviewing the definitions from the ONF, or any of the talks from various SDN luminaries, several things emerge:

The intelligence that used to be embedded in the hardware in a device is replaced by commoditized hardware, turning a “smart switch” into a “dumb forwarding element”
The intelligence that used to be embedded in the device OS is largely (if not completely) replaced by some combination of controller commands and a “network OS”
A controller (or more likely, a set of controllers) are magically communicating with the outside world (see below) and updating the network
A set of Northbound APIs are (or will be) birthed that enable new “innovative applications” to be written
All of this is tied together with a Network Operating System

Let’s assume that the first point is valid, for at least some functions (though why the notion of a subset of the functions of a switch equates to “all of networking is solved” is beyond me; much more likely, it enables some form of hybrid network, where “legacy” and SDN networks try to co-exist, but more on that later). I’m still bothered by the stark reality of how network devices work, and whether it is indeed possible for merchant silicon to be mass produced and be usable for commodity functions to effect this transformation, but again, let’s say we can do this, at least to some extent. Let’s further suppose that this can be powered by #2.

If both #1 and #2 are true, is it then true that all that is needed is just a controller to manage #1 and #2?. First, it is silly to assume that a controller has the intelligence, capacity, and knowledge to deal with storage, compute, and other types of element problems in addition to the network (i.e., the switch) that it was supposed to govern in the first place. This is because the goal is not to control just the network! Why would it ever be that? The goal is to manage everything that is in the network. Second, you’d have to wheel this puppy into the NOC/data center using a 10 ton fork lift if you wanted to cram all of that functionality into one server. Third, the assumption that compute and storage resources behave like network resources is worse than the assumption that the different planes of the network behave similarly. Fourth, we haven’t even began to discuss the real problem: applications. Networks exist not for hello packet verification, but to run applications. Customer applications. Customer applications that make the money that pay the salaries of the employees of the organization that runs the network. The goal, here, is NOT to celebrate the fact that we can poke around in the data plane. Customers simply don’t care. Furthermore, either do most admins. The real problem is how to support customer applications. If we have to poke around in the data plane to take a measurement, fine. The problem is that the data, control, and management plane operate at different scopes and time scales: the data plane at packets on individual devices; the control plane at seconds on isolated portions of the network; the management plane at much longer periods (tens of minutes to several hours, depending on the complexity of the task) on most of, if not the entire, network. This in turn requires a complex combination of control loops, which is often made more complex by the introduction of “person-in-the-middle” portions, where the control loop stops until a manual input is received. Now, as in a true control loop (ok, bad joke, but I’m low on coffee), we are back at my first point: SDN has not provided any good abstractions for us to use. Fundamentally, managing customer services using a switch’s flow table is just plain wrong for the vast majority of customer applications. That is not sufficient to manage either the complexity of forwarding applications or the complexity of configuring them.

#3 is where we come to a full stop. How do we connect the controller to the outside world? This could be as simple as an event bus, or perhaps a connector to an EMS or NMS, or perhaps something more exotic. However, this is just the beginning of our problems. How does a business person enter SLAs or similar abstracted directives into the system? Clearly, an SLA and a switch flow table will not have a good time at a poker game, let alone trying to communicate using a common vernacular, so no problem, let's write an app. On what API? You still have to translate "John gets Gold Service" to a set of flows. And how is that translation, as well as the App, going to be portable? (And where is the "x86" abstraction that ensures interoperability? 3/3 in the undefined column! So even IF SDN did offer something better, how do I get to it? I have plenty of problems dealing with interoperability issues in my S-world of EMSs, NMSs, OSSs, and BSSs, or my C-world of CMDBs and CMSs, and now you want me to enter the unexplored, undefined, app world? As mentioned before, this means that rather than reducing my management burden, it now makes it more complex. And fatter. A lot fatter. And there aren't even any mirrors.

#4 would be great, except that the Northbound API is it still largely unspecified. While I can understand the phenomenon of the NBAPIs lagging the SBAPIs (because the latter are shinier and prettier than the former), this has nothing to do with the most serious flaw of all - #5. Not only have marketing people, through an act of magic much more evil than stealing the moon, have postponed the controller (or more likely, set of controllers) standardization, they have:

decided that the management plane will magically disappear (since it is never really talked about in the mainstream SDN definitions, and not really talked about by most vendors), and
deferred any substantial talk (let alone definition) of the Network Operating System until later

This is where the SDN miracle currently happens. When pressed, most SDN proponents whine about keeping things simple. My favorite Einstein quote is “Everything should be as simple as possible. But not simpler”.

More specifically, the control plane can NOT replace the management plane. How are multiple controllers coordinated? There is a big difference between (1) changing the configuration of a device to a known configuration to provision a simple service, (2) analyzing monitored data in order to decide how to best fix a problem, and (3) trying to discover the root cause of an unknown problem. Even if you convince yourself that the first is a control plane function, the second and third are clearly not (or at least, not completely control plane functions). More importantly, embedded within all three is the notion of a control loop (or, more realistically, multiple control loops) that must be situated in order to accommodate multiple data formats in multiple languages at multiple levels of abstraction. Again, I am mystified at how SDN, as it is currently defined, is able to solve the second or third problem, and can only convince myself of a small number of relatively simple problems that it can solve in the first category. As far as the Network Operating System, I really have no idea what the SDN proponents had in mind, since they didn’t describe it in enough detail for me to guess. However, if it was me, then it would be the glue that would enable different types of applications, resources, services, and devices to all be able to use the SDN paradigm.

“Dang it! So does this mean you aren’t gonna build my SDN network”?

Not at all. I just want to understand what you mean by the terms, and then go build it better. Stay tuned.

Disclaimer: These are my personal opinions, and do not necessarily reflect those of my employer.

Does SDN Make My Network Management Look Fat?

Archived Content

Tags

AI Data Centers: Scaling Up and Scaling Out

Advanced Networks for Artificial Intelligence and Machine Learning Computing

DCD>Survey: Data center networking trends

Future-proof your datacenter with DDC S-Series