I've been involved with a lot of network and application monitoring products during my career primarily at networking vendors, dating back to 1990. I see software-defined networking (SDN) as a real inflection point for the traditional monitoring products, where better use of the data gathered can actually influence the network topology designed by SDN products. However, without some form of organization or standardization at the meta layer it will all end in a lot of frustration and one-offs.
Let me lay out the traditional foundation for network monitoring as an example, but this could be applied to application monitoring also.
Consider that the network device (switch/router) will gather stats on all its interfaces. Raw data is sent northbound to the associated monitoring product, which in turn aggregates the data and builds nice reports for your boss, or kicks off alerts for threshold violations, or pushes the raw data to a billing app.
In these and other use cases, it's a one-way northbound street. If anything comes back to the network in terms of changes, it's manual, and it's slow, and it's certainly not SDN.
In the world of SDN, we have the ability to program the network, and we can do it through automated APIs. We expect the SDN controller to make the overall network decisions, but my take is that network-monitoring products should be a large core of the decision-making for SDN.
Why SDN Needs FeedbackTo do that, the monitoring product needs to make use of a two-way street and feed decisions (not just data or alerts) back to the network or the SDN controller.
The monitoring products are the only products that have multivendor, networkwide physical and virtual infrastructure use, so let's make use of them. The SDN controller has only a partial view of the network at best, and if it relies exclusively on this partial view, you will have misleading info and maybe a worse application experience for the user.
There are a couple of challenges to make this happen. How do we standardize the information that the monitoring tools send down? And what use cases can they solve?
Let's start by talking about an obvious use case of what we can do with the data with our SDN controller, and why it's not so easy without some input from the administrator of the environment.
Consider a typical network where one of the links is congested. Network devices push the interface stats to the monitoring product, the threshold bit flips, and we know we have congestion.
Let's get some metadata about the congested link, using Netflow data to give us, say, the top 10 flows. Now report these flows, and the least congested link across the same path, to the SDN controller, which can then move those flows to alleviate the congestion. Right?
If only it were that easy. Do you really want to blindly move the top 10? What if they are the most important flows? What if they consume 99 percent of traffic? I may not want to disrupt them at all, so maybe we move the non-important flows — but how do we know what is and what is not important? The network cannot actually determine what is and is not important. Only the owner of the application that is generating the flows can do that.
Looking Past the Rear-View MirrorThis actually leads into the other challenge, in that we need to standardize this into a model of some sort, so that we can define what is important and what to do in the event of an application affecting conditions.
With this model, we can also alleviate the problem of reactive monitoring with bursty network traffic. Consider our capacity planning example: By the time you have decided that the capacity has reached your threshold and the SDN controller has moved around the big flows, the flows causing the congestion might have already stopped, and some other flows now are causing congestion. Do you move those flows too? And by the time they are moved, some other flow could be causing a problem.
You can never build a future network simply using rear-view-mirror monitoring.
Combine what you know (about the network), with what you want (how the application should perform) in a standardized model, and we may get somewhere with SDN.