By the year 2021, Gartner estimates that a total of 25.1 billion edge devices will be installed worldwide. These devices continue a trend toward the edge with smart devices generating higher and higher volumes of real-time streaming data, which, in the name of efficiency, must be processed locally.
Distributed applications have grown to encompass edge, cloud, and mobile environments, and the increased scale and complexity of these applications requires that developers make the most of available compute and storage resources. With edge devices creating a tsunami of data, the combination of cloud and on-premises databases would seemingly be the logical solution for taming volumes of streaming data being created at the edge.
There’s one major problem: databases don’t work at the edge.
That may seem like a bold assertion, but it has to do with the fundamental way databases operate. At their core, databases store information. They don’t have agency, or do any actual “thinking” for themselves. Instead, they provide a central datastore, which can be queried to provide state information. This pattern is indicative of stateless application design. Stateless application design essentially means that applications services (the bits of logic that “do” things in applications) don’t have to remember anything they’ve done previously. Instead, a database keeps track of an application’s state, and anytime an application needs to do anything, it asks the database first.
The stateless model works well for web applications, which present static media and query database tables to inform applications. In stateless applications, the primary concern of application developers is logistical and is focused on how to get data into a database. In an ideal world, databases want information to be consistently formatted for efficient storage and access of data and to manifest at predictable rates to prevent bottlenecks. But the real world isn’t stateless, observations from edge devices are perishable and don’t fit neatly into a database table. Instead, data is generated in the real world continuously and is distributed across complex deployments. Each edge node has a unique local context, which can illuminate the data it generates. In order to understand the data from these real-world systems, it’s critical that this context is maintained and is made locally available to applications.
Why the Edge Needs a Stateful Solution
Here’s a quick computer science lesson. Imagine we’re having a conversation at work and I ask you how the weather is at home. In a stateless world, you wouldn’t remember how the weather was before you left for work. Instead, you would drive 45 minutes home, check the weather, drive 45 minutes back, and then tell me how the weather was. That’s how stateless database applications treat communication between edge devices and databases. However, humans, and the real-world in general, are stateful. We remember how the weather was this morning, and can recall it instantly. Instead of a simple conversation taking several hours, we can exchange pleasantries in seconds.
When it comes to the real world, where context is readily available, stateless design seems like a ridiculous way of operating. But it becomes even more extreme when considering the time it takes to send a data packet over a local network, compared with computing locally at the edge, a difference in magnitude of 10 to the power of 7. A CPU cycle (about 1 nanosecond) is like having a stateful conversation. For example, I ask how the weather was at home and you respond, while roughly 1 second has elapsed. Meanwhile, holding the same conversation a stateless manner would require two trips across the network (about 10 milliseconds each way). Translated into seconds, each trip over the network would incur 115 days (or 10 million seconds) of latency, each way. And a conversation that otherwise could have concluded in 1 second, takes a stateless application the equivalent of 230 days.
Network innovations such as 5G may make it possible to send more data across the network — faster than ever before. But while network capacity is increasing, network latency won’t ever be able to compete with the speed of computing locally, as network latency is ultimately subject to the speed of light. Using the earlier example of the two co-workers, advancements in network technologies like 5G may make it possible to trim latency from 115 days to, say, 40 days. While that may seem like a great improvement, it’s still a far cry from the 1 second it would take a stateful application to have the same conversation.
One emerging pattern for building stateless edge applications involves data filtering at the edge, while still doing the bulk of data analysis in the cloud. This may reduce the amount of data that must be transmitted over networks, but unless you can alter the speed of light, waiting for the cloud is always going to be 10 million times slower. Another pattern is the use of IoT gateways or edge data centers, which move the database closer to edge deployments, thus reducing the distance network packets must travel to reach a database.
But this ignores the fact that any reliance on the network to establish state increases latency by orders of magnitude. For real-time applications like autonomous vehicles, augmented and virutal reality (AR/VR), gaming, manufacturing, and thousands of other use cases, accepting that kind of delay is untenable. Especially considering we already know how to build stateful applications.
Big Data is Dead, Long Live Big Data
This is not to say that cloud, database, and the so-called Big Data technologies are suddenly obsolete. That simply isn’t the case. In fact, Big Data technologies enable many of the patterns and methods that make it possible to build stateful edge applications. But in a cloud-dominated world there seems to be a common wisdom that edge devices are generally dumb, and shouldn’t speak unless spoken to, thus leaving the cloud as a single point of aggregation. The reality is that edge devices are already speaking, and they have a lot to say. According to Morgan Stanley, since 2010, manufacturers “have collected 2,000 petabytes of potentially valuable data, but discarded 99 percent of it.” The 1 percent that is already processed makes use of Big Data. The challenge, therefore, is how to efficiently extract the value from the 99 percent.
Edge computing provides a way to extract value from the full dataset, without throwing away useful data. By having application context available at the edge, you can identify relevant insights and only discard the noise. When combined with innovations in machine learning, algorithms suddenly have access to much larger datasets and can apply them to streaming data in real-time. By deploying machine learning algorithms to the edge, they’re no longer bound to learning on historical datasets and can instead train and iterate on data as it streams. The utility of applying machine learning with post hoc batch analysis versus applying machine learning in-the-stream is the difference between analyzing why it rained yesterday and knowing that you’ll need an umbrella today.
Breaking the Database Addiction
Everything we’ve built over last 30 years assumes a batch analytics model, using a traditional database architecture. The database model still works, but it’s incomplete. As long as we’re bound to batch analytics, we’re doomed to throw away useful data. We need real-time edge computing to augment databases and provide a more comprehensive model that treats real-time edge data differently than historical cloud data.
Today, the 99 percent of data that’s being discarded contains immense value. But to achieve that value, streaming data must be acted on locally. Networks and centralized databases are still useful, but we can’t make them the bottleneck for acting on data and still expect real-time results.
At the end of the day, it doesn’t matter how we’re building applications today. What matters is what the applications of tomorrow will look like and the use cases they will enable.
Autonomous vehicles, AR/VR applications, gaming, artificial intelligence, smart cities, and many other next-generation technologies have yet to hit an inflection point, because they are stalled working with a paradigm unsuited for massively distributed systems which generate real-time data. Stateful, real-time edge computing will enable these applications, and more, by providing the means for processing and distributing streaming data throughout complex systems without slowing it down.
The future of distributed applications won’t distinguish between the edge and the cloud. Instead, the future of distributed applications lies in acknowledging that there are different types of data: real-time and historical. Stateless, database-centric architectures will always be optimal for performing batch analytics on historical data. But stateful, local computing is optimal for processing real-time data, and distributed architectures that operate statefullt at the edge will come to define the next generation of disruptive applications.