One component of Oracle's Big Data Appliance that's received a lot of attention is Oracle NoSQL Database, a non-relational distributed system that can handle multi-terabytes of keyed data across numerous data nodes. Oracle NoSQL Database represents a significant departure from Oracle's more traditional relational database management system, Oracle Database.
The NoSQL movement
With its roots in companies such as Amazon and Google, the NoSQL movement emerged as a way to address the growing need to process and analyze extremely large data volumes as quickly and efficiently as possible and in ways that traditional relational databases couldn't handle. NoSQL, short for not only SQL, provides a flexible structure for working with massive data sets that can be scaled out and distributed across geographically dispersed commodity hardware.
NoSQL databases are used, for instance, to collect click-stream data from high-volume websites, monitor online retail behavior, aggregate real-time sensor data and support social networking communications.
Most relational database management systems adhere to the ACID standard -- atomicity, consistency, isolation and durability -- to ensure that data is always in a predictable and protected state. NoSQL databases take a different approach to data storage by adhering instead to the BASE standard described here:
- Basically Available: Data is replicated and partitioned across multiple nodes, often on self-contained commodity hardware, to ensure it will be available even in the event of multiple failures.
- Soft state: Unlike ACID-based systems that strive for consistency, a NoSQL database permits data to be in an inconsistent state. This can occur after data is modified but not fully replicated.
- Eventual consistency: After it has been fully replicated, the data is brought to a consistent state.
Although a NoSQL database can support large data sets and low-latency operations, it doesn't provide the type of granular control that a relational database does in terms of ensuring data integrity and reducing data redundancy and inconsistencies. However, it can be an effective tool for storing and accessing the type of data a NoSQL database is designed to handle.
It's not only the BASE properties that differentiate a NoSQL database from a relational one. The ways in which they store and structure their data are also very different. Furthermore, the NoSQL products themselves can vary significantly in the ways they handle data.
Oracle NoSQL Database
Oracle's NoSQL database can be fully integrated into its enterprise application technologies, including Oracle Database, R-based analytics and the Hadoop MapReduce framework.
The Oracle NoSQL Database is built on the Oracle Berkeley DB Java Edition, which provides the mechanism necessary to store and manage large sets of key and value data across a distributed environment. Within the database, each record consists of an identifying key and its associated value.
A record's identifying key itself is itself made up of a major key and minor key that together identify the record. For example, suppose a database stores a set of user profiles. For the major key, you can use the profile name. For each minor key, you can use a profile attribute -- first name, last name, email address or phone number. The record's associated value portion contains the user's actual name, email or phone. Thus, you have a record for each attribute of each user profile grouped together by the major key.
The beauty of this system is that the application has complete control over defining the major and minor keys. Different major keys can have multiple different minor keys associated with them, as well as variable numbers of minor keys. All the database requires is the key and value structure. For the most part, it's indifferent to the data itself.
Oracle NoSQL Database architecture
An Oracle NoSQL Database is usually part of a three-tier application model, with the Web server at the top tier, the application server at the center tier and the database in the back end. The application programming interfaces (APIs) necessary to communicate with the database are embedded in the application. In order to process and direct the queries, a client driver provides an interface between the APIs and the database.
At the heart of an Oracle NoSQL Database is a set of storage nodes that house the data and make it available to the client driver. Typically, a storage node is a physical machine with its own memory, CPU cores, and local storage, either disk or solid state. Each storage node hosts one or more replication nodes.
Replication nodes are grouped together into replication groups. A replication group is made up of one master node and one or more read-only replication nodes, or replicas. Every replication group in a database contains the same number of replicas, and all nodes within a group house the same data.
Data modifications occur only on the master node. As a result, the master node always has the most up-to-date information. The modified data from the master node is copied to the replicas, which support the read operations. If an application queries data that has been updated on the master but not yet replicated, the data is in an inconsistent state. In a NoSQL system, it is up to the application to ensure consistency. However, Oracle NoSQL Database provides record versions to help with this process, and some applications tolerate low levels of inconsistencies.
Each application node includes one or more partitions that contain the actual data. Oracle NoSQL Database maps the record keys to the partitions and spreads the data evenly across available partitions. Once a key is assigned to a partition, it cannot be moved. Oracle NoSQL Database maintains the partition structure to best support low-latency read operations.
Implementing an Oracle NoSQL Database
This is only a high-level overview of all that goes into an Oracle NoSQL Database. You have a great deal of control over how you set up your system. You can, for example, specify a consistency policy that determines the level of inconsistency that will be tolerated on the replicas. In addition, you can specify durability policies that determine the degree to which data will be recoverable if a system crashes.
Oracle NoSQL Database's flexible architecture also lets you control throughput and storage capacity. For example, adding storage nodes usually results in greater aggregated throughput and greater storage capacity. Adding replication groups can improve write performance. However, while more nodes in a replication group can mean faster read throughput, it also leads to slower write operations. The Oracle NoSQL Database has a lot of flexibility and can be configured according to your individual requirements.
Oracle NoSQL Database also supports an administration service for controlling database installations from either the command line or a Web interface. You can use the service to configure a database instance, start or stop a database, or monitor system performance. The service also collects and monitors performance statistics and provides online monitoring capabilities.