Practical X86-virtualization, as pioneered by VMware, has profoundly changed IT, in a way that no other technology advance has ever done before. Once perfected, the insertion of a thin virtualization layer (the “hypervisor”) is as close to invisible as one can imagine, especially given how remarkably it changes how we can use computers. The success of server virtualization has lead to speculation about network virtualization. What if we could construct multiple “virtual networks” on one physical network? Could a “network hypervisor” could be developed with comparable simplicity and results. We certainly have a slew of network problems where “virtualization” looks like an attractive solution approach. We seriously doubt that any such silver bullet can be found for networks, and here’s why.
Modern X86-virtualization began as a part of a computer system research effort at Stanford. The Stanford Lab has a long history of wanting practical and usable innovation, and for that reason wanted a way of running common software workloads on experimental hardware systems. Since most of those workloads ran on the X86, and at the time experimentation was easiest on MIPS architecture, they wanted a way to emulate the X86 on a different machine architecture. Emulating the X86 is a challenging problem because the X86 has evolved greatly over two decades, and as a result architecturally resembles San Jose’s Winchester Mystery House. The Stanford team made good progress, and once X86 emulation of any form seemed practical, the team focused their attention on the emulation of X86 workloads on the X86 (what we think of as virtualization), and VMware was born.
Ed Bugnion, VMware founder and founding CTO, recently returned to finish his PhD at Stanford by detailing the technical innovation done. It’s a fascinating story. Early on, the VMware team made two critically important decisions: (1) they would emulate the X86 so perfectly that all the major existing operating systems would run under virtualization in their binary form, completely unchanged and (2) they would minimize the performance penalty so that software brought to market with performance optimized for a given process generation CPU (e.g. 65 nm) would work with comparable or better performance when virtualized on a CPU from the next process generation (in the example, 45 nm). Most experts at the time thought achieving these goals was impossible. To be sure, it was very difficult. For example, virtualizing the existing operating systems depended on experimentally determining what specific X86 instructions actually did, which was not what the official Intel architecture manual defined (the existing operating systems made use of the actual behavior).
Despite the fact that task looked impossible, VMware made it work with the amazing result that one can grab an X86 server workload, convert it to a virtual machine form (P-to-V), plop it down on a shared virtualized server and it just ran. Amazing! This enabled server consolidation, it enabled running very different stacks on the same hardware, and it enabled shared and automated data centers, elastic computing, and so on. Amazing. Never have so many benefited so greatly from the work of so few, to paraphrase the old saying.
Can we do something similar with networks? Finding the elegantly simple network virtualization solution is all the more important because server virtualization is so simple and effective and begs to have the other system components adapt as transparently. For example, if you virtualize an application that runs on a set of physical servers, the virtual machines probably want to find each other and cooperate with each other over the network exactly as they used to (unchanged), using a variety of network mechanisms, including low level mechanisms like ARP. But if we build a big virtualized data center, and we can position our VM’s flexibly, how we keep all that “local” network traffic between the VM’s that are part of one application system from overwhelming the shared network? How can we create separated “virtual” networks and manage them as simply as we manage virtual machines. The sad answer is that we probably can’t in any comparable simple and elegant way.
It’s not because the virtual network people aren’t as bright as the VMware team; the current SDN and OpenFlow startups have some of the smartest people in the valley. It’s because the virtualizing a network is very different from virtualizing a server. I’m not foolish enough to try and prove that building a network “hypervisor” can’t be done (“never say never!”) but here’s why I wouldn’t bet on it being done. Networks can unquestionably be virtualized; the question is at what effort and with what impact on the continuing use of existing equipment. If it’s a “fork lift” upgrade, then what is the value proposition that justifies the additional capital outlay? But back to the problems: first of all, at the physical level, network operation depends on specialized, shared programmable hardware (packet forwarding hardware) and associated real-time software (e.g., responding to local changes to assure bridging and avoiding packet forwarding loops). There aren’t any comparable globally shared hardware and distributed server resources; server virtualization in contrast is a very local issue (the interface between a software work load and a virtual machine). A virtual network is distributed software and hardware. There are more moving parts. (File systems are distributed, but software uses storage in abstracted ways already, not as raw hardware.)
Secondly, there doesn’t seem to be any simple way of managing the entire control plane in a network while reusing a lot of the existing device software (avoiding the forklift upgrade). Today, network control plane management happens through the exchange of messages between adjacent network nodes. For example, if a specific link fails, the failure is detected by the nodes that directly connect to the link. They respond to the failure by changing their local control plane so that the failed link is not used and then communicate impact of that change to the other nodes they directly connect to. Those nodes in turn deal with the change, adjust their local control plane and in turn propagate the impact to their adjacent nodes. Again, in contrast to the virtual server case, there is a lot more distributed (and complex) mechanism to deal with.
I don’t mean to suggest that we can’t build new forms of networks that will solve the problems we are currently vexed by, or to suggest that interesting solutions can’t be found leveraging today’s networks. What I do mean to suggest is that it’s unlikely that a simple and elegant solution like the hypervisor will be found as a magic bullet.