Thanks to all who joined us for the Mellanox & Microsoft webinar where they showcased how SONiC platform can be utilized to scale cloud data centers. After the webinar, we took questions from the audience. Unfortunately, we ran out of time before we could get to answering all the questions but you can read the full Q&A below.
What type of work we need to do to support SONiC on our switch platform?
First, the ASIC needs to support SAI. This is provided by chip vendors. Second, to provide platform driver support, e.g. controlling fan, sensor, transceivers. Please follow the porting guide on the WiKi https://github.com/Azure/SONiC/wiki/Porting-Guide.
What is the thinking process when Microsoft designed SONiC through containerization?
Through years of cloud infrastructure operation, we have gained lots of insight into building and managing a large scale, high availability network, e.g. upgrade without customer downtime, manage a heterogeneous network, etc. Through multiple design iterations, we found Docker container provides the right amount of isolation, rich ecosystem, and enables fast evolution, which fits the requirements nicely. Therefore, we took this approach in SONiC.
How big is the operation cost/change for Microsoft to use SONiC in your Data Center?
There are some changes on how to configure and how to troubleshoot the switch that engineers need to adopt. The benefit of development velocity, and amount of control gained easily overcome the cost.
What is the feature roadmap of SONiC?
There is a feature roadmap on the wiki: https://github.com/Azure/SONiC/wiki/Sonic-Roadmap-Planning. We will continue to add new switch platform, switch ASIC and new scenarios overtime.
How can SONiC support line rate packet rates?
Supporting line rate packet forwarding is a function of the data plane and therefore depends mainly on the underlying switch ASIC. If the switch ASIC supports line rate switching, a well-designed software stack (which SONiC is) can take advantage of this. To learn more you can read the latest Tolly Report on line rate switching.
How does SAI enable support for different switch ASICs? Isn’t every switch ASIC different?
SAI is the common API on top of ASIC SDK. By supporting SAI, different ASICs will use the same language to talk to the application above. It is a big step towards hardware and software disaggregation.
You mentioned congestion and microbursts. Is all congestion the same? What is a microburst and how should I think about it?
Congestion occurs when a switch ASIC cannot keep up with the traffic and causes buffers to fill up, increases latency, and can eventually result in packet loss. Congestion and avoidable packet loss occurs is caused when the switch ASIC itself is unable to keep up with the required forwarding rate. This type of congestion does not occur with ZeroPacketLoss switches. The second type of incast or “microburst” congestion occurs when two or more ingress ports target the same output port. This type of congestion is unavoidable, so it is important to look for the switch ASICs with the best microburst duration before packets start being dropped.
Is SAI same as SAL in OpenDaylight SDN open source project? What are differences?
No, they are different. SAI (Switch Abstraction Interface) is a set of APIs on top of switch ASIC SDK. SAI enables a uniformed northbound interface from ASIC to the switch control software. SAL (Service Abstraction Layer) consists a set of services sitting much higher in the stack. It provides a common messaging and data storage functionality based on user-defined data and interface models.
Is there any management app to manage all switches in network at high level than individual switches?
There are many network management apps in the commercial market. SONiC itself is a single box software stack, which does not include the broader management stack.
Can you please tell us if sonic can reconcile its database if there are route changes while BGP is down and after it has come back up?
Yes, BGP changes in the network will be reflected in the database through the BGP module, e.g. Quagga.
What percentage of your server footprint is currently leveraging SONiC? Where do you think it will be in a few years?
The actual footprint is confidential. Azure will use in on all T1 and T0 switches.
What about security? Is the OS secured with SELinux?
SONiC is currently based on Debian Jessie. It is possible to setup SELinux or AppArmor if a user would like to.
Any chance of using sonic with a hypervisor for lab usage as you would use OVS?
Yes. You can use SONiC-P4 software switch, which has a P4-based software switch and the real control plane stack. We use it ourselves for lab and dev purposes. For more information, please refer to https://github.com/Azure/SONiC/wiki/SONiC-P4-Software-Switch
What is the maturity level of Open Source monitoring / management solutions? Is it feasible at enterprise scale or are we stuck with the big names e.g EMC-SAS, APM’s etc
This varies and highly depends on your comfortable level with open source tools.