Thanks to all who joined us for the Dell EMC webinar Unlocking the Power of Open Networking where we heard first-hand how open networking is helping athenahealth address key technical and business challenges in their Data Centers, and gain first-hand insights into their experiences. After the webinar, we took questions from the audience. Read the full Power of Open Networking Q&A below.
What was the main way you realized savings by going with the open networking solutions?
We have realized savings in a couple of main ways:
- White box switches are simply cheaper per port when compared to traditional well-known networking vendors. Depending on your discount levels you may see significant capex savings by simply switching to white box even after you factor in the cost of NOS (network OS) licensing.
- We also made use of QSFP-40G-LM4 optics. In this particular case, it costs a bit more than SR4 but a lot less than LR4. It works on both SMF and MMF so we only stock this one optic reducing need to maintain 2-3 different stock piles of $600+ optics. Also, this simplifies data center operations, though we didn’t try to quantify this into actual dollars.
- White box NOS such as Cumulus Linux doesn’t enforce the use of name brand optics, so you are free to shop around for the best option out there that works for you. The drawback is that you have to test them in the lab before you commit to using them in production, but the savings are noticeable when switching from name brand to OEM versions.
- We retired a commercial configuration management system we used to deploy common configurations to all devices in the data center. We now rely on Puppet Community Edition to do the same.
- We were able to streamline equipment deployment and replacement process and cut the time to deploy per switch by about 2/3rds. This is less man-hours we have to pay for when building out or replacing equipment. Additionally, when we take over a new space, we start paying rent on it the day we walk in. Every day we spend on building out the space including deploying hardware costs money. By deploying faster we are reducing our costs on the unused space.
Are you investigating and investing in any new ways of building your data center networks?
We have a great working partnership with Cumulus Linux. They have been instrumental in our success on this journey. Cumulus Linux is actively promoting a solution they call “routing on host”. The main idea is that you run a routing protocol on your servers and peer with your switches. This completely eliminates need for VLANs and STP. It also greatly simplifies automation and standardizes your network. We are investigating ways of moving towards this model to reduce network complexity and introduce more clearly defined abstraction boundaries between application and the network. By removing complexity and state from the network as much as possible we are aiming at exploring even more “barebones” network solutions.
Is testing and provisioning time and effort impacted with open networking?
We realized significantly reduced provisioning times by combining automation with open networking solutions. Previously each expansion project had to involve a senior engineer to select equipment and carefully fit it into the existing network. Today we simply delegate this task to a junior engineer who can reference a standard BOM, request a quote, place a PO and deploy it all without as much as a minute of time from the “expensive” senior engineers.
Also, because configuration is now standardized though automation and documentation, configuring the next batch of equipment is as simple as copying a set of standard YAML documents and modifying loopback IP and BGP ASN.
We also can model our network in a virtual environment like Vagrant/VirtualBox to test how most changes will behave before committing them to the production code base and rolling them out. This not only allowed us to test changes ahead of time and therefore go into a change window with high confidence, but also allowed us to deploy many changes during production hours instead of scheduling a change window.
How are you managing the spanning tree issues?
At this moment, we make use of EVPN with VXLAN in our data fabric. VXLAN is encapsulating layer 2 frames into layer 3 packets. With this the underlying fabric is all layer 3 which eliminates the need for STP, so we simply don’t have STP and thus we don’t have STP issues. We are also exploring eliminating even VXLAN and EVPN to further reduce complexity and limit the network requirements to BGP and ECMP support.
On our management network where we also made use of Open Networking (white box) switches and Cumulus Linux STP still behaves in a traditional way with one uplink being a root port and forwarding and the other link being blocked. So if you do have STP, nothing changes when you switch to Open Networking. You still have to consider STP designs and make sure all your ports are set to the right STP mode, your bridge priorities are configured in a predictable fashion and learn to troubleshoot STP.