However, I want to turn the focus more onto what is going on underneath: the Data Plane. While it is an architectural decision how and whether the control and data plane are being separated or how hybrid approaches are used, it is still essential to understand that the notion of a “flow” – a key concept of an SDN – is what makes the data plane in an SDN infrastructure different. In general a “flow” is the communication between endpoints on a network. The definition of a flow (needs to) exists in both the data plane and the control plane.
Most networking vendors don’t have that expertise to develop large scale flow based systems anymore – neither in terms of switch and system architecture, nor in terms of required ASIC development. And also the commodity ASIC market so far did ignore this as more scale here requires – amongst other things – more memory which is expensive. And higher cost is not necessarily a design goal. So even one might expect that the commodity ASIC market would pick this up, as of today none of the large ASIC players has something to offer that is scalable in the context of flows. And almost all SDN startups today focus on software – controller – solutions where the cost of entry is much lower than in the hardware business.
Why is this important at all? When you use a flow-based system then the first packet can be used to make very sophisticated decisions in software (and thus in the controller or even other applications) and then subsequently all packets of that flow are switched in hardware. This is also the basis for all of the new, advanced and agile services that are associated with SDN.
So far the community that preaches a centralized SDN controller design with commodity ASICs are unable to address the scaling challenges – both on the control plane and the data plane. They’re working around this drawback in two ways:
- The definition of a flow becomes coarse (aggregation) so fewer new decisions need to be made in the control plane and fewer flows need to be programmed in the data plane. And instead of looking at each field from Layer 2 to 4 (or even higher) they reduce this by just looking at MAC or ranges of IP addresses – visibility and control becomes less granular. This is good for connectivity but not that usable for visibility and control on the application layer as one loses important features like Netflow and ACL among others.
- The flows are pre-provisioned in the data plane so the controller doesn’t get overwhelmed with new flow requests. This means that you need to know in advance who wants to talk to whom. As with the previous strategy, this can be used for connectivity services but not a lot more.
So how many flows are we talking about? Based on my experience, one can expect one to two new Layer 4 flows/second per client device like a desktop or tablet (which determines what the controller needs to process) and anywhere from 10 to 20 concurrent flows per client device as well if you consider the edge of the network. A Server in an Enterprise Data Center is typically 10x higher than that (in terms of flows per second and concurrent flows). Servers hosting internet facings services will be orders of magnitude higher.
So this means that given a standard 10,000 employee Enterprise campus network with three devices per user, one can expect up to 30k to 60k new flows per second and also 300k to 600k concurrent flows – all based on Layer 4 flows which is the starting point for a true, application-aware data plane even beyond TCP and UDP ports at Layer 4. Note that all of this in standard operation – no denial of service attack, no network failover etc. Those flows will then need to be programmed into every device that is within the path or some other aggregation techniques will need to be used.
The commodity ASIC market currently provides high speed ASICs with up to 1 Tbit/s throughput, but they max out at 4k concurrent flows based on traditional TCAM technology. As well, todays controllers are not designed for high performance, real-time flow decisions – this is why the pre-provisioning becomes so important. As a result, there is a huge gap. On the other hand, some network vendors can provide systems that scale to million of flows using customer flow based ASIC designs. That needs to go along with they also have local and also distributed control plane capabilities to scale the control plane processing for real time decisions.
When comparing solutions, customers should focus on what a vendor can provide on both the control and the plane level and ask what the future plans are so good decisions can be made.