BlueData has developed a prototype running its big data platform to launch clusters using the Kubernetes container orchestrator. The move is another step in bridging the gap between the stateless world of Kubernetes and stateful needs of big data.
Tom Phelan, co-founder and chief architect at BlueData, said the prototype is using its EPIC (Elastic Private Instant Clusters) big data platform running on Kubernetes. The controller is deployed as a stateful pod with its own public IP address. Customers can then manage the cloud-based cluster in the same manner in which they manage bare metal servers. This allows for the launching of big data clusters using Kubernetes.
“The goal is to run unmodified big data software in containers so data scientists can spend their time analyzing data rather than fighting hardware and drivers,” Phelan said.
BlueData is targeting the move at Fortune 1000 companies that have been challenged in managing big data analysis. Phelan explained that these firms are attempting to optimize hardware usage and connect to numerous data lakes while controlling security risks. These include firms in the financial, legal, medical, insurance, and government sectors.
BlueData provides a big data software platform that uses embedded Docker containers to deliver big-data-as-a-service for its customers.
Many of BlueData’s customers today are using bare metal servers or virtual machines (VMs) to support their big data needs. However, Phelan explained that customers are looking to streamline finances and operations around the open source container orchestrator.
“These customers are still for the most part just dabbling with Kubernetes, but they are very interested in going in that direction and want to know if there are ways to manage their big data needs as well,” Phelan said. “That’s what we are trying to show.”
Kubernetes has elbowed its way to the top as the enterprise choice for container management. While challenges still exist, most analysts and vendors have noted that Kubernetes has become an important component for enterprises looking to maximize their cloud deployments.
However, Kubernetes is designed primarily for stateless applications. This means that it was not created to handle data storage. This has led to a robust business of storage vendors developing stateful appendages that can plug into a Kubernetes-managed container deployment to handle storage needs.
BlueData is part of that development, though focused on larger data needs. Phelan made a point to note that BlueData is not a storage provider, and instead is an infrastructure platform for handling the automation and lifecycles of data storage needs.
“Big data is very stateful. It’s not microservices or cloud native,” Phelan explained. “Big data is monolithic and uses a lot of local storage resources. We definitely have our work cut out for us.”
The Kubernetes community has begun to more formally address data storage needs. Some of the more recent platform updates have begun to identify the concept of stateful, which Phelan said is helping the process.
But, those efforts fail to take into account pressing security issues. This requires, among other items, a consistent IP address. BlueData works to automate the configuration of the software running on the containers to handle the security and data storage needs.
As for the running prototype, Phelan said that once the firm is comfortable with stability it expects to release a commercial version. That is expected to happen over the next 12 months.
“What we have seen so far looks pretty good, but we are still running tests to make sure its ready for our customers,” Phelan said.