The PaaS allows data engineers to run Hadoop or Spark jobs in the cloud. Hadoop and Spark are open-source, distributed database systems designed for interpreting large amounts of data. The company, which went public in April, helped pioneer the market for Hadoop.
Data engineers can run Apache Spark, Apache Hive, Hive on Spark, and MapReduce2 through Altus. Cloudera calls these open-source data engines “data pipelines.” Altus executes and manages these pipelines, deploying them on clusters and automating processes like cluster provisioning, configuring, and termination.
“Altus allows data engineers to transform data that lands in S3 really fast, and far more easily than they could ever do before,” said David Tishgart, Cloudera director of product marketing.
Amazon Simple Storage Service, or S3, is designed to make web-scale computing easier for developers. These types of cloud storage services are becoming increasingly popular for their resiliency, scalability, and relatively low cost.
Tishgart says other similar PaaS offerings, such as Amazon Elastic MapReduce (EMR), require users to move data from S3 to another service before they can do anything useful with it.
“Altus does the data processing for you, and then you can take that process data and do a variety of other things with it, without ever having to move that data to another service,” he explained. “On EMR, you have to move it to an analytics base if you want to do analytics. On our platform, you can leave the data where it is, so it’s a much more simplified way to do data analytics.”
The new service also allows users to troubleshoot failed jobs with or without the clusters or compute infrastructure being present. Additionally, its workload management flags significant performance deviations and proposes a root cause analysis.
Altus works with multiple versions of Cloudera Distributed Hadoop (CDH), Cloudera’s open source platform. This also makes it easy for Cloudera customers to migrate their on-premise workloads to the cloud.
“Customers will use the exact same applications and processes that they are using on-prem today in the cloud environment,” Tishgart said. “So there’s no need to use new tools or rethink how you are using security. It’s the same tools running in a different environment.”