If network behavior can be accurately captured as source code, does that mean that changes in the network will follow a workflow more common in software development?
One of the major issues with networking today is the fragility of most networks. Even simple changes can result in unintended consequences, which has spawned an entire practice around change control. It is not uncommon to have narrow change windows where only committee-approved modifications are permitted. Compare this environment to modern software development practices. Using agile development methodologies, coders are able to translate user stories (requirements) into working code in hours and days, with the overarching principle being to test early and often, and roll changes forward.
The power of DevOps is not just in bringing increased levels of automation to networking but also in employing more robust coding and testing practices around those changes. The goal is to both increase the network’s tolerance for change, and to ensure that those changes do what they are intended to do.
If network engineers can express desired network behavior in code, then that code can then be tested as it would be if it were a software product. Doing this requires several things:
- Changes need to be captured atomically (typically as builds in a source code management system)
- System images (compiled code) are pushed to a test (or production clone) environment
- Changes must be tested (ideally through a combination of manual and automated regression tests)
- Upon successful test completion, changes are deployed to production through a release management system
But what do these tests look like?
In a legacy networking environment, the tests are largely a set of manually executed health checks. Before you configure BGP, you might ping the BGP peer to ensure it is reachable, or check the routing table if there is an IGP dependency. Similarly, you might examine interface counters pre- and post-change to validate that traffic is flowing as expected.
This approach is largely a bottoms-up testing methodology. The challenge here is that these steps vary from change to change, and there is no system-level description of expected behavior that can be uniformly tested after every network modification. Essentially, you are depending on brute force and completeness of execution to protect yourself from human error, which is both error-prone and extremely expensive. The fallback tool for networks operating under this model is a library of MOPs and SOPs, and an army of high-paid specialists backstopping those.
In a highly automated world, testing is different. For example, rather than checking individual device behavior feature-by-feature, you might express the network requirements more holistically. Which hosts do you expect to be reachable? Which routes ought to be in the routing tables? What flows do you expect to see on the network? Essentially, DevOps allows you to layer the testing in much the same way that networking technologies are layered.
You can also start to identify dependencies between network elements and technologies. When there is a change to a QoS policy, you can check all other nodes on which QoS is (or ought to be) configured to ensure consistency. Dependency mapping serves as both a means of documenting the network as well as quickly identifying potential problem spots when changes are made.
These tests can be automated and executed as part of the code deployment process. You can then hook up traffic generators and automate the checking of things like flow counters, interface statistics, and performance SLAs to make sure that traffic is flowing the way you expect. Your tests can include negative testing, allowing you to exercise things like ACLs and filters. You can inject malformed packets, simulate failures like loss or link outages, or even test for vulnerabilities.
The key to evolving to this type of environment is understanding explicitly what behavior you expect out of your network. Nothing can be automated unless it can be both specified and measured. Does your network have the appropriate instrumentation? Do you have management tools that can harvest information from your varied set of data sources in the network? Do you understand explicitly what failure and success look like?
A migration to DevOps is a shift in methodology more than a shift in tools. The power of such a change is a more agile development environment that is simultaneously more resilient and less costly to maintain.