As a vendor of an OpenFlow Controller, and management systems for SDN and other infrastructure components, we get to talk to industry experts, component companies, systems vendors and data center operators and when they speak about SDN Virtual Networking they often forget about the physical network
One of the recurring themes of those promoting SDN-based Network Virtualization (or SDN Virtual Networking) is that the physical infrastructure in a data center becomes a large IP fabric/backplane that provides free-flowing multipath capacity that the virtual networks can use. In other words, applications owners can basically forget about the physical infrastructure. In fact, with most virtual SDN products you can’t see anything about the physical network (or whatever the underlying layers are) even if you wanted to.
I used to work for a high-tech company where the HR vice president would roll her eyes whenever somebody used an automotive analogy. But when it comes to networks, they are just so darn handy.
The utopian vision of virtualized SDN reminds me of how the United States Interstate Highway System was depicted in the beginning, where everybody would have the freedom to drive their own automobile across the country without having to slow down for traffic or stop lights. In the minds of most it has had major benefits but anybody who has ever travelled on a “expressway” in any major city knows at times they aren’t, and increasingly, not just during commuting peaks. It also gets worse the bigger the scale. Congestion and traffic engineering is important in smaller towns, but the more vehicles there are, the worse the traffic jams.
In the large data centers, these multi-path fabrics will be architected with protocols like ECMP (Equal Cost Multi-Path routing). In simple terms ECMP is technique where individual IP flows, identified by some combination of the source location, the destination location, data type, etc. are assigned to particular physical path through the network. Although the protocol has differences, it isn’t a lot different from how LAG (Link Aggregation) works at Layer 2.
Here is where it gets complicated. Imagine you and your friends go to the grocery store to load up for a big party. When you go to check out, a store employee, with no visibility about which line is shorter or moving faster at that instant tells you which line to get in. The five grocery carts which make up your “flow”, you at the end, since you are the one with the credit card, get in that line. Your five carts are stuck in that particular line, in that order, no matter what. To use an automotive analogy, it’s as if the vehicle you are in is assigned to a particular lane on a particular road, constrained to that lane even if another lane is less congested.
Continuing our motor vehicle analogy, back in 1956, Greyhound Lines ran an advertising campaign with the slogan “It’s such a comfort to travel by bus – and leave the driving to us.” The bus system is like the Virtual Network overlaid on top of the physical network. It doesn’t matter whether it is VXLAN, or some other mechanism, and like the bus, you are subject to what is happening in real-time on the physical network. You can certainly prioritize traffic, such as HOV (high occupancy vehicle) lanes, but at times, even those are congested.
But even buses have a flexibility that Virtual Networking schemes don’t. The bus driver, even if their route is constrained can drive around the stalled car in their lane. With Virtual Networking, the SDN Virtual Network has no visibility to what is going on in the physical network, let alone any ability to adapt to what is going on. Network protocols time out, some higher layer protocol requests a re-transmit, and congestion gets worse.
What can we conclude from this?
- Don’t believe the myth that now that the network is virtualized, we don’t need to worry about the physical layer. The behavior of the physical network is a crucially important part of the whole picture.
- Really big server farms are potentially subject to worse congestion problems than smaller ones, and even if the data from different customers and applications is isolated, congestion can impact other users.
- So far, performance (e.g. Gigabits per second or packets per second) hasn’t been a turf claimed by any of the Virtual Networking vendors. This is very different from what happened with every generation of physical network technology. Having the highest throughput gave a vendor bragging rights. It is similar to what happened with Server Virtualization, where first it was about manageability, and performance claims came later.
- Getting high performance from Virtual Networking is going to require closer coordination with the physical networks than exists today. The ability to correlate between the Virtual Networks and what’s underneath them, including the multi-path topology, will be key to achieving the headache-free vision that SDN promises.
Postscript: Although we hear a lot of data centers are planning ECMP-based fabrics, it isn’t by any means the only option. One good one we are interested in that hasn’t gotten much attention is MCF (Multi-commodity Flow) technology which has some potential advantages.