What’s fun about being a journalist/analyst (or “Janalyst” as I call it) is you get to meet lots of people — and learn new things. Sometimes these things are a bit shocking, such as some of the stuff I learned when I met Peter Phaal, President and co-founder of network monitoring technology company InMon.
Never heard of InMon? Well read up — it’s quite a story about the startup, as well as the technology. I was introduced to Phaal by a mutual friend in the networking circles. I met with Peter twice. We met first in a law office, of all places, the firm where Phaal’s firm had its patent work done. This is a fitting venue, because nearly everything that Phaal and InMon have done revolves around intellectual property.
InMon has four employees, two of them are founders. There is no rank-and-file. The revenue number is private, but I understand the company has very respectable revenues for four people and is profitable. There is no big manufacturing facility or corporate campus. In fact, InMon just rents a small office space in the financial district of San Francisco. Yet InMon’s network monitoring technology, sFlow, is installed in millions of networking chips. In fact, if you are running a networking box using merchant silicon from the likes of Broadcom, Intel, and Marvell, the chances are it has InMon technology inside it. Licensees include Alcatel-Lucent, Brocade, Cisco, Dell, Extreme Networks, Hewlett-Packard (HP), Huawei, IBM, Juniper Networks, and NEC, among many others. InMon’s technology is monitoring vast portions of the Internet, as we speak. It works by installing an “agent” at the silicon (chip) level, and measuring and gathering data. That data can then be fed into smart analytics platforms. The goal? To listen and watch and figure out what’s going on.
The technology and concept behind InMon and Phaal’s vision is quite simple — you have to be able to see inside the network to understand it. Phaal, a native South African, worked at Hewlett-Packard Laboratories prior to Inmon. There, he invented Hewlett-Packard’s Extended RMON technology. With that, I’ll let Phaal explain in an epic Q&A that should make networking nerds proud.
Attention: This interview is for hardcore networking geeks only. Investment bankers probably won’t understand. But if you want to understand the future of networking and where virtualization will take us, this Q&A with Peter Phaal is probably a good primer.
Rayno Report: So what makes you unique is that your technology is embedded in so much silicon. How did you get these deals with all the merchant silicon chip makers?
Phaal: It takes a long time.
Rayno Report: Okay, but how did you get the deals? I mean, if you came out with a startup and your business plan said, “we’re going to get software agents embedded with all the major silicon players,” the venture capitalists (VCs) would think you were crazy.
Phaal: If I went to a VC with that business plan they would laugh me out of the room.
Rayno Report: Exactly. So how did it happen?
Phaal: It was a lot of good luck. The first lucky thing is that hardware we could leverage was already in the HP products, which they’d done earlier. This was better and it was a good upgrade path. Also, HP wasn’t able to deliver a high-end switch. At the time they were OEM’ing Foundry. They sent me in as a consultant and I worked with Foundry to define sFlow, actually. They were at the cutting edge of gigabit switching. So they co-authored sFlow with me and integrated it into their ASICs.
The growth rate of vendor support is fairly slow. It took another 18 months for another vendor responding to competitive pressure.
Rayno Report: Who was that?
Phaal: I think it might have been Extreme Networks. What happened is those vendors also wanted to have low-end products for campuses, so they were looking to merchant silicon vendors like Broadcom and Marvell to implement the measurement technology. And then, that caused a second wave of vendors, and they implemented it.
So we’re now in the third of fourth wave, each time a merchant-silicon vendor comes on board. First it was Marvell and then Broadcom, and Broadcom is driving all the adoption in all the 10-gig switches. Now every top-of-rack switch from every vendor is a Broadcom Trident chip.
[Publisher’s note: Broadcom, you can thank us for the advertisement.]
The growth rate of vendor support started fairly slow. The thing that drove Sflow adoption was Cisco not doing it. It made it a selling point. They were all selling against Cisco. So if you have a differentiator, you talk about it.
The compelling value finally cracked Cisco. So the Nexus 3100 and the 9000 all use Broadcom silicon. Cisco started to adopt sFlow in those platforms.
Rayno Report: So you are still a small company. How many people?
Phaal: Four people.
Rayno Report (stifling skeptical laughter): Four people! How do you manage all those relationships?
Phaal: sFlow became a checkoff standard. The market expects it, the vendors just do it, there is very little interaction from us, other than to keep them honest. We see our role as if a customer complains to us that the vendor’s switch isn’t compliant, we’ll work with the vendor to fix that.
Rayno Report: So basically, you own the intellectual property — so it’s a licensing model?
Phaal: Yes. You get a license to the patent and trademark on condition that you faithfully implement the standard and use the brand.
Rayno Report: They provide all the support and interaction with the customer.
Phaal: Yes. The business is two parts. One is growing the agent market. That has to be free and that has to be open. So we formed Sflow.org, that engages the stakeholders. That’s a way to develop and extend the standard.
The second part of the business is our commercial business which is selling the analytics software which consumes the data. It’s analogous to the OpenFlow SDN market. The OpenFlow agents are free, everybody’s interest is in proliferating them and the value is in the controller. It’s the same thing with Sflow. All the real value is in analytics. What you do with the data is where the core value is.
Rayno Report: Everywhere you go now there is a new open-source model. It’s been around with Linux for a while, but it’s never penetrated enterprise networking or telecom on that scale.
Phaal: But it has. Linux on the server space is totally dominant.
Rayno Report: Okay, then I’m talking about the networking gear.
Phaal: Yes. That’s why I think companies like Cumulus Networks are real interesting as part of this puzzle. If you deconstruct a switch the same way a mini-computer was deconstructed by a PC, you end up with white-box things from Taiwanese vendors competing with a common platform.
That’s why the OpenCompute initiative from Facebook — to have a reference switch architecture — it becomes just like buying a motherboard. It’s a commodity and drives down the cost and makes software operating systems more interesting. So there’s Cumulus, there’s a bunch of different ones, Big Switch is doing one, Broadcom has a kit that anybody can use. It separates the operating system from the hardware and creates two separate markets.
With all things in computing it tends to favor a dominant player. As a developer it’s a pain in the ass to port to a lot of operating systems. So I think some kind of Cumulus Linux, or a Linux variant, running on Broadcom ASICs is going to be the operating system, and then people will deploy SDN on top of that.
Rayno Report: How far will this wave take us? This is a bit of a phase shift for Cisco, is it going to accelerate?
Phaal: It’s going to take two steps. The first wave is people like Facebook and Google, they move the tastes, they already have SDN and they are already building their own switches. They are enormous consumers. Google and Facebook are two of the largest PC manufacturers in the world. I could see that happening with networking too.
As you move into the service providers, they are cost sensitive, look at Rackspace and Amazon. A lot of enterprise workloads are moving into those clouds. Even though the enterprises themselves aren’t adopting this technology, they are adopting it by proxy.
I see the enterprise market for IT hardware declining or stable as any new thing gets deployed in the cloud. the in-house stuff is for legacy. But there are still enterprises that still use Cobol and VAX clusters.
Rayno Report: So what do you think of Cisco’s recent activity?
Phaal: Insieme is their response [to SDN]. Their messaging is quite encouraging. They are talking about applications. Then they have a controller. They talk about it taking responsibility for storage and compute. So they are in the position, being a server manufacturer as well, to actually integrate them. Whether they actually do it is another question. Making that happen is hard. HP is in the same position. IBM is in the same position. Dell is in the same position.
VMware I think has the best chance of doing it, because they are a software play and they are network hardware agonistic.
Rayno Report: What’s the most common struggle these customers have? Everybody describes these transitions as clean, but they are usually a lot messier. When you move to cloud there are new things such as security.
Phaal: In the financial sector, they have all sorts of regulations. In medical, too. That is one of the inhibitors of them moving into the cloud. I don’t focus that much on the enterprise or legacy stuff. I see plenty of growth in the service provider and cloud space. It’s a very big opportunity.
Rayno Report: Where is the growth? Building out data centers?
The highest growth sector is 10-gig data-center switching. They are all being consumed by Web 2.0 and cloud service providers.
Rayno Report: But it’s a free-for-all, isn’t it? You have lots of vendors going after it, and there are many startups with new architectures.
Phaal: There’s a lot of opportunity for everybody. It’s sort of like when the PC market started, you got an explosion of companies, and then they get winnowed out. It’s very healthy to see a lot of companies. Companies like Pluribus [Networks], which has a different take on it, and Plexxi and Cumulus. They are all exploring the space.
Rayno Report: You like Pluribus?
Phaal: It’s an interesting company. Their view is you can have central control (of switching) or you can view it as a cluster of switches. Pluribus is a bunch of [former] Sun engineers that really understand data centers. What they’ve done is interesting, they are using Broadcom ASICs, but they are not using the Broadcom drivers. The software that Broadcom provides is 7 million lines of code, geared toward a CLI [command-line interface]. It’s slow to do configuration changes.
Pluribus is saying, let’s pare it down to bare metal and we’ll understand the registers and we’ll build the device drivers.
Rayno Report: And what do you think of Plexxi?
Phaal: I really like it. They have a good controller. They are also using merchant silicon. Where they are disconnected is they have no way to discover affinities. In my experience, most users have a poor idea of how their applications are architected. So they rely on partnerships with people like Boundary to give them data.
If they enabled sFlow in their hardware, they would be able to discover those relationships and more important the strengths of the relationships, and they could self optimize. [Editor’s note: Okay, so only one required sales pitch inserted here.]
I’m a big fan of optical-circuit switching. If you take away your top-of-rack switches and build in an optical circuit switch you can build topologies on the fly to match demand. But to do that, you need to understand it, you need analytics. I think that’s the missing piece in their story.
Rayno Report: It’s interesting, I’m a survivor of the optical bubble. optical switching has come back — but in the data center!
Phaal: It makes a whole lot of sense. If you think about it, if you look at traffic patterns in mulit-tenancy data centers, the patterns are very structured. Your security model isolates traffic between tenants. So there’s no sense in which you have a full traffic matrix of everybody talking to each other.
If you look at how data centers are designed, the so-called “fat tree,” the assumption is you have a flat traffic matrix. But if two hosts talk a lot, it makes sense to tie up links directly between those switches. This architecture allows you to use measurement to provide the bandwidth where it’s needed.
Rayno Report: Okay. I think I got that. So talk about Nicira?
Phaal: They are taking a highly structured traffic matrix — a tenant who talks a lot to themselves — and they are shuffling it, they are randomly scattering those VMs (virtual machines) around the physical networks.
If I’m Nicira [the virtualization technology company that VMware bought], I have the opportunity to have visibility into the traffic patterns and I have the ability to place that VM wherever I want. So I can do even better traffic engineering. I’m going to place the workloads, If I have two tenants that have heavy workloads I can put them on the same server and completely eliminate them from the network. Or I can put them on the same top-of-rack switch. That’s why Hadoop is efficient, it’s rack-aware and moves compute to storage.
Rayno Report: You almost lost me there.
Phaal: Well, the big thing about Hadoop, there is lots of data. So it makes sense to preserve locality, it’s expensive to move a Petabyte of data. If you want to do a search on that Petabyte, it makes more sense to move the search function to the data. I’ll move the computation task to where the data and then collect the results data. There is so much data it can’t be kept in the same place.
Rayno Report: So what you are saying is that these data centers need to be designed with a lot more information about how the applications are interacting with one another.
Phaal: Yes. It’s kind of like designing your closet — if you just throw stuff in the closet, it doesn’t all fit. If you hire California Closets you can fit eight times as much.
This is what Google does. Their secret sauce is a scheduler that places tasks in their data center and optimally packs them.
Rayno Report: What do you call that?
Phaal: Their previous generation was called the Borg. I don’t know the name of what they have now. It’s analogous to any of these allocation tools. OpenStack is there to organize resource, or vSphere. It’s a question of whether they do that job well or poorly.
I would argue that a lot of them do it poorly because they limit it to a subset of the resources. For example vSphere looks at the compute [computing resources], but not the network. [Nicira’s] NSX looks at the network but not compute.
Rayno Report: So they are all just point solutions?
Phaal: NSX [Nicira’s platform], it’s an important part and it solves an important problem. It’s about logical structure and policy. But they aren’t really concerned about how that maps onto physical infrastructure. But that seems to be changing.
The unfortunate thing is people see measurement as an afterthought. We’ve put in place the measurement that solves that problem. We can now unlock that latent capability.
A lot of businesses really under-utilize their networks. As they get rolling into production, the network really does matter. The interesting thing in network traffic, there are lots of really short interactions, lots of chatter. There are tons and tons of those interactions but they don’t consume that much data.
Most bandwidth is consumed by what we call elephant flows. Managing the elephant flows — like moving a SAN and a huge amount of data at once. That’s the key to scalability.
Rayno Report: That’s the universal problem, many of these data centers don’t actually have enough information about what’s going on?
Phaal: That’s exactly my point. You need measurement. If you are an engineer and you have uncertainty, you have to measure. Performance inefficiencies are an insidious problem.
So if you want to build your network safely, you double the capacity that you need. And then you’ll double it again. Nobody’s going to be fired for having too much capacity, especially if you don’t know what that is because you have no measurements. But you will get fired if the site goes down because it runs out of capacity. So you end up with these vastly over-provisioned systems. And nobody knows how much money they are wasting because they don’t have the measurement systems that would have prevented that in the first place.
Rayno Report: It seems like your vision is to overhaul how we look at networks.
Phaal: There are multiple dimensions here. What I find exciting is the idea of a software-defined data center, breaking down the silos.
Rayno Report: Explain the silos.
Phaal: Software engineers like to layer things. They are going to build a storage management system, or a compute management system, and they decompose things by functions. I’m a control engineer by training. Control engineers look at the dynamic system and we decompose things by how interdependent they are and by time scale.
If you look at a data center, the compute, storage, and networking components are tightly coupled. You need visibility into all aspects, the network, compute, storage, and applications, and then you want to control all aspects. Being able to have the choices to always make the best move is where you get the 10X improvement.
There is a combinatorial benefit to putting all these guys together, and the vendor that first cracks that is going to really dominate the market.