ra2 studio - Fotolia
Oracle's Big Data Cloud Service offers enterprise users a platform on which to quickly and easily implement a big data architecture based on Apache Hadoop and other open source technologies.
The service utilizes Oracle's cloud infrastructure and other technologies, from Oracle and elsewhere, to provide a complete environment to set up, manage and elastically scale Hadoop clusters through a centralized portal. Big Data Cloud eliminates many of the cluster implementation complexities for Oracle Hadoop users by providing the tools necessary to deploy a system, secure its environment and integrate it with other services.
The heart of the service doesn't come from Oracle itself; it comes from Cloudera Inc.'s CDH distribution of Hadoop and related big data tools, which together comprise a scalable, integrated architecture for managing massive volumes of heterogeneous data.
For those not in the know, Hadoop is an open source framework for building distributed processing systems across clusters built on commodity hardware. Because of its distributed architecture, Hadoop can effectively manage petabyte-scale data sets and support sophisticated analytics while controlling security, governance and data access.
Hadoop and more
CDH includes four core Hadoop modules that help facilitate storage and processing operations: Hadoop Common, a set of utilities that supports the other modules; the Hadoop Distributed File System (HDFS), which can store a mix of structured, semistructured and unstructured data; the Hadoop Yarn job scheduler and cluster resource manager; and the MapReduce processing engine and programming framework.
In addition to the core Hadoop components, CDH offers a number of other Apache technologies that work in conjunction with Hadoop to expand on or add to its capabilities. Many of them are integrated into the Big Data Cloud Service along with the Hadoop distribution.
One of the most important is the Apache Spark processing engine, which supports a wide range of operations, including data transformations, machine learning, batch and real-time stream processing, and advanced modeling and analytics. IT teams often use Spark as a batch processing engine rather than MapReduce because of Spark's flexibility and in-memory processing capabilities, which offer significant performance improvements over MapReduce.
The Apache technologies supported by Big Data Cloud also include:
- HBase, a nonrelational, key-value data store for handling large data sets distributed across HDFS clusters;
- Hive, a data warehouse infrastructure built on top of Hadoop deployments that supports analytics, data summarization and ad hoc queries against large data sets;
- Oozie, a workflow scheduler for managing Hadoop jobs;
- Pig, a data flow language and execution framework for performing complex analytics, aggregations and transformations against large data sets;
- Sqoop, a tool for transferring bulk data between structured data stores and Hadoop clusters; and
- ZooKeeper, a coordination service for maintaining and synchronizing configuration and naming information for distributed applications.
Like other cloud-based services, Big Data Cloud evolves quickly, so the list of supported Apache tools will likely change over time. Refer to Oracle's documentation to view the most current list of what's available as part of the service to support Oracle Hadoop deployments.
Other tools from multiple places
Along with CDH, Big Data Cloud also gives Oracle Hadoop users several Cloudera-developed tools that come with its Cloudera Enterprise Data Hub Edition bundle, and that are particularly useful for working with Hadoop clusters.
That includes the Cloudera Manager administrative console, the Cloudera Search software for accessing Hadoop data, the Cloudera Navigator data management and security tool, and Apache Impala, a SQL-on-Hadoop engine that was created at Cloudera and elevated to a top-level project by the Apache Software Foundation in November 2017.
It isn't all just a repackaging of Cloudera's technology, though. The Big Data Cloud Service also includes a number of other utilities and applications for managing data and system resources, as well as the following Oracle Big Data Connectors to facilitate access to Hadoop data: Oracle Loader for Hadoop; Oracle XQuery for Hadoop; Oracle Data Integrator Enterprise Edition; Oracle R Advanced Analytics for Hadoop; and Oracle SQL Connector for HDFS.
Additionally, Big Data Cloud includes Oracle Big Data Spatial and Graph, which offers spatial and graph analytics services that support Hadoop workloads and NoSQL database technologies. It also offers optional integration with Oracle's Big Data SQL Cloud and Cloud Infrastructure Object Storage Classic services.
Big data convenience at a cost
Big Data Cloud is efficient and greatly simplifies the process of standing up Oracle Hadoop clusters that can process and store vast amounts of data.
However, Big Data Cloud does come at a price that varies depending on computing and storage resource requirements, as well as the selected payment plan. In this case, the price starts at $29.0322 per compute hour or $14,400 per month for a three-node starter pack cluster.
For many organizations, the price is worth it when taking into account the complexities and costs that come with standing up their own big data platforms.
Implementing a Hadoop cluster is a significant undertaking and should not be taken lightly. That said, organizations that have the in-house IT infrastructure and resources to pull it off themselves might prefer the more granular control that an on-premises deployment provides.
Before considering Big Data Cloud or any other big data platforms for your organization, be sure to do a thorough analysis that looks at the long-term total cost of ownership and in-house resource requirements. As always, the devil is in the details.