BlueData launched an open source project to tackle the challenges of deploying and managing distributed stateful applications using Kubernetes. The moves are targeted at large-scale applications like analytics, data science, machine learning, and deep learning applications for artificial intelligence (AI) and big data use cases.
The overarching work is called the BlueK8s initiative. (K8s is the shorthand version for Kubernetes.) The first open source project within BlueK8s is the Kubernetes Director (KubeDirector).
Tom Phelan, co-founder and chief architect at BlueData, noted in a blog post that the work fills in current support gaps. He explained that the Kubernetes ecosystem has developed some projects like the Operator Framework, Helm, and Kubeflow for managing distributed stateful applications. However, they lack the enterprise-level capabilities to support more robust application sets.
That is where KubeDirector comes in. KubeDirector is built on the Kubernetes custom resource definition framework. It leverages native Kubernetes API extensions, design plans, and authentication to minimize the learning curve for developers. It also provides native support for preserving application configuration and state, and uses an application-agnostic deployment pattern to minimize the time to onboard stateful applications to Kubernetes.
“The goal is to run unmodified big data software in containers so data scientists can spend their time analyzing data rather than fighting hardware and drivers,” Phelan said.
KubeDirector manages the cluster for stateful applications instead of forcing a user to build and implement an application-specific Kubernetes Operator. KubeDirector applies the application-specific workflows to transform the current state of the cluster into the expected state of the cluster.
Kubernetes and State
Kubernetes is designed for stateless applications. This means that it was not created to handle data storage. This is not a problem for cloud native web services like a web server or a front-end web user interface that do not depend on the local container storage for the workload.
However, stateful applications are services that save data to storage and use that data to run the application. These include databases and complex applications like big data and AI use cases that involve large-scale data processing, data science, and machine learning (ML). Basically these are workloads that currently use platforms like Spark, Kafka, Hadoop, Cassandra, and TensorFlow.
This has led to a robust business of storage vendors developing stateful appendages that can plug into a Kubernetes-managed container deployment to handle storage needs. Phelan made a point to note that BlueData is not a storage provider, and instead is an infrastructure platform for handling the automation and lifecycles of data storage needs.
Not an Operator
Phelan also differentiated KubeDirector from other ecosystem efforts, specifically the Operator Framework.
Red Hat launched the Operator Framework platform earlier this year. It’s a toolkit to manage native Kubernetes applications, which are known as Operators. An Operator is basically a controller that runs Kubernetes for a particular application. It does this by using the Kubernetes API to handle the creation and management of application instances.
The Operator concept is targeted at distributed applications. It allows for the scaling of instances as needed. It also supports the setting of policies in a declarative manner by telling the network what is needed and letting the Operator execute the specific actions to get to that state.
Phelan noted that “while the implementation of a Kubernetes Operator for managing a cloud native stateless application is fairly straightforward, such is not the case for all applications.”
“Most applications for big data analytics, data science, and AI/ML/DL are not implemented in a cloud native architecture, and many of these applications are stateful,” Phelan explained. “In fact, a distributed data pipeline typically consists of multiple different applications each with their own unique attributes; and these applications vary widely depending upon the use case.”
Phelan said this limits the ability to containerize those applications into microservices without a lot of reconfiguring.
BlueData is working with the Cloud Native Computing Foundation to extend the reach of the BlueK8s initiative.
Photo copyright: michaelpa / 123RF Stock Photo