The company disclosed its Pigeon switch — a 3.2 Tb/s design that’s suitable as a leaf or spine switch — last week in a blog entry written by Zaid Ali Kahn. The switch is the result of Project Falco, which LinkedIn started a year ago after discovering a latency problem in its network.
Homemade switches such as Facebook’s Wedge and 6-Pack are glamorous, but there’s reason to believe the webscale players don’t want to make their own gear forever. Facebook is open-sourcing its ideas through the Open Compute Project. And when Google explains its networking projects, I hear a bit of subtext: If vendors build something like this, Google might be interested.
“We are not venturing into developing our own switch because we aspire to become experts in the switching and routing space, but because we want control of our destiny,” wrote Kahn on LinkedIn’s blog last week. “We continue to be supportive of our commercial vendors and work with them in a decoupling model.”
LinkedIn’s latency problem turned out to be subtle. After some digging, the company found that flash floods of packets would occasionally overwhelm the switching chips, packing them with traffic at full line-rate capacity, wrote Kahn. Full line-rate capacity is not the normal mode of operation for a switch, so the chips’ packet buffers would overflow during these “microbursts,” as LinkedIn calls them.
In other words, the problem was inside the chips. Vendors provide tools for monitoring their switches, but there’s no way to peer into the switch chips themselves.
Getting a vendor to offer that telemetry was going to take too long, so LinkedIn started looking at the advantages to building its own switch — citing reasons familiar to anyone who’s followed the Facebook and Google networking experiments.
For example: Commercial switches come with more features than most companies need, which is a logical design choice for the vendor but looks like unnecessary overhead to any individual customer. LinkedIn also considers commercial switches to be behind the times in terms of supporting Linux automation tools such as Chef and Puppet.
Project Falco will continue through 2016, wrote Kahn, adding that the project could involve running LinkedIn’s code on other hardware platforms, provided LinkedIn can get enough access into the chips.