Until recently, enterprise high performance computing (HPC) users have mostly dabbled with public cloud for departmental or single application use cases. While the idea of pay-per-use cost models certainly sounded good, they were hard to realize in practice. Concerns included cost, performance, bandwidth, data security, and a variety of technical challenges related to deploying complex application environments on ephemeral compute nodes.
2017 Marked a Tipping Point
In 2017, there was an obvious shift in sentiment among enterprises. Customers who were skeptical about cloud began to embrace it. In a March 2018 survey conducted by Univa, 61 percent of its customers fully indicated that they were open to cloud computing or already using it. Why the sudden change? The availability of HPC-class infrastructure, faster interconnects, and better data management are certainly factors. However, in our view, maturing management tools and the rapid adoption of containers have been decisive.
Hybrid Cloud is Where the Action Is
In HPC, hybrid-cloud is where the action is. Unlike traditional IT environments that are often underutilized, HPC users have spent decades wringing every ounce of performance from their computing environments. It’s not uncommon to see HPC clusters with sustained utilization over 90 percent. At this level of efficiency, technologies like virtualization simply get in the way. Under five cents per-core-hour may sound cheap, but this is expensive compared to the cost of managing high-utilization HPC clusters in-house.
This doesn’t mean that cloud computing has no place in HPC. The cloud can be much less expensive at the margin where resources are needed only for short periods of time. Also, for specialized applications requiring exotic hardware, renting rather than buying can make sense. The trick is in striking the right balance between on-premises and cloud-based deployments.
For most CIOs, the question is not whether to deploy a hybrid cloud strategy, but how. With these considerations in mind, we offer six recommendations for building a future-proof hybrid cloud strategy.
- Get your on-premises house in order – As management consultant Peter Drucker famously said, “You can’t manage what you can’t measure.” This is true in business and also in cluster management. Without understanding the cost of running applications locally, it’s hard to decide whether cloud computing makes sense. Before considering cloud, users need sound workload management and the ability to measure resource use and demand by user, group, project, and application. Only when these fundamentals are in place will users be in a position to assess whether tapping cloud resources is beneficial.
- Build a frictionless cloud-bursting foundation – If you’re going to supplement local capacity with cloud-based resources, the process needs to be seamless, reliable, and automated. HPC application users are experts in their fields, but it’s not reasonable to expect them to master a cloud provider’s provisioning of interfaces and tools. Tapping cloud resources needs to be automated and seamless. Users don’t care where their simulation runs. They just want high performance, fast turnaround times, and the ability to monitor and self-manage their workloads. If you make cloud-bursting frictionless, you will get optimal results from your hybrid cloud strategy.
- Stay application and cloud-provider agnostic – Increasingly, application providers offer their own cloud solutions. A computer aided engineering (CAE) tool vendor might encourage users to supplement local capacity with their cloud-resident Software-as-a-Service (SaaS) offering, or a cloud provider may offer a free workload manager with on ramps only to their own cloud. Having multiple application or cloud-specific bursting solutions will make management a nightmare. Users need a single framework that is application and cloud agnostic to simplify management, improve flexibility, and help ensure cost transparency.
- Leverage containers for portability – Container technologies like Docker and Singularity are convenient ways to package applications for cross-cloud mobility and consistent replication of results. While not a prerequisite for hybrid clouds, containers make applications faster to deploy, easier to maintain, and they improve reliability and portability. Organizations should look for solutions that support a variety of encapsulation approaches in addition to traditional workloads.
- Ensure you can enforce business level policies – Ironically, there is danger in making cloud bursting too easy to use. HPC users have an insatiable appetite for compute capacity. Rather than run a 10-hour simulation on 1,000 local cores, wouldn’t most users prefer to deploy 5,000 cores in the cloud and complete their simulation in two hours? Guardrails are needed to avoid unauthorized use of premium-priced cloud resources. Workload management needs to be tightly integrated with cloud provisioning and policy enforcement so that administrators can decide who will consume what amount of cloud resources and under what circumstances; and then turn off cloud instances when the work is completed.
- Think past today’s workloads – Application architectures are changing. Traditional HPC applications aren’t going away, but they are being supplemented by new “born-in-the-cloud” frameworks (NoSQL, Spark, TensorFlow, etc.). Kubernetes enjoys particularly strong momentum, and most cloud providers are now offering Kubernetes-based services to support an increasing variety of containerized, Kubernetes-friendly applications. Organizations should look for management tools that support not only today’s applications but future workloads as well.
While it’s still early, hybrid cloud computing for HPC is quickly becoming commonplace. The Wharton School of Business is one of several organizations that have successfully implemented a hybrid HPC cloud sharing of resources among 20 research centers with policy-driven cloud bursting. To meet this growing interest, Project Toruga makes it easy to automate the deployment of cloud and hybrid cloud environments across multiple cloud providers. Tortuga was open sourced by Univa this March.
While many applications will continue to run locally, for HPC users, hybrid environments appear to be the way of the future.