How to use Oracle SQL CUBE for cross-tabulation

Using the SQL CUBE function in Oracle SQL at the right time can cut down on how much querying you'll need to do.

Oracle SQL contains several advanced aggregation features that provide reporting capabilities generally associated with Online Analytical Processing (OLAP) reporting tools. The SQL CUBE, ROLLUP and GROUPING SETS extensions to the GROUP BY clause are often unused by SQL developers because they either do not understand it or do not recognize where to apply them. The CUBE operation in particular has a fairly narrow range of applications....

However, within that range, it can generate results that would otherwise require several individual queries.

The SQL CUBE and ROLLUP keywords have similar output when used to modify a GROUP BY clause. Where the ROLLUP keyword will produce subtotals when added to the GROUP BY clause, the SQL CUBE keyword will produce cross-tabulation values. ROLLUP will produce only subtotals for the grouped expressions read from left to right plus a grand total (n+1 groupings). When the CUBE keyword is used, all of the columns used in the GROUP BY clause are cross-referenced to produce a superset of groups. Any aggregate functions in the SELECT list are used with the superaggregate rows to produce summary data. A CUBE will generate subtotals for every possible combination of the supplied expressions plus a grand total. This will produce 2^n groupings (an n-dimensional cube).

A CUBE that is grouped by columns A and B will produce four groupings (2^2 = 4):

  • Row totals grouped by column A alone
  • Row totals grouped by column B alone
  • Row totals grouped by columns A & B
  • A grand total

The syntax for a statement including a SQL CUBE is:

SELECT [col1, col2, …], aggregate_function(col3)…
FROM   table
[WHERE  condition]
GROUP BY CUBE group_by_expr
[HAVING  having_expr]
[ORDER BY  order_by_expr]

Below is an example querying the EMP_DETAILS_VIEW in the HR sample schema. The CUBE operation is grouping on two columns (REGION_NAME and DEPARTMENT_NAME) and generating the sum of salaries for employees. I've added a notation to the right of each row indicating the grouping set to which it belongs. A standard GROUP BY clause would simply have produced the results using both columns (the A&B set). The CUBE example below generates two additional columns by A alone, 11 results by B alone, plus a total for all employees.

SELECT region_name, department_name, SUM(salary)
FROM   hr.emp_details_view
GROUP BY CUBE (region_name, department_name)
ORDER BY region_name, department_name;

REGION_NAME   DEPARTMENT_NAME    SUM(SALARY)
------------- ------------------ -----------
Americas      Accounting               20308   A&B
Americas      Administration            4400   A&B
Americas      Executive                58000   A&B
Americas      Finance                  52600   A&B
Americas      IT                       28800   A&B
Americas      Marketing                19000   A&B
Americas      Purchasing               24900   A&B
Americas      Shipping                156400   A&B
Americas                              364408   A&B
Europe        Human Resources           6500   A
Europe        Public Relations         10000   A&B
Europe        Sales                   304500   A&B
Europe                                321000   A
              Accounting               20308   B
              Administration            4400   B
              Executive                58000   B
              Finance                  52600   B
              Human Resources           6500   B
              IT                       28800   B
              Marketing                19000   B
              Public Relations         10000   B
              Purchasing               24900   B
              Sales                   304500   B
              Shipping                156400   B
                                      685408   Total

Generating the same results with standard GROUP BY statements would have required four individual queries combined with SET operators:

SELECT region_name, department_name, SUM(salary)
FROM   hr.emp_details_view
GROUP BY region_name, department_name
UNION ALL
SELECT region_name, NULL AS department_name, SUM(salary)
FROM   hr.emp_details_view
GROUP BY region_name
UNION ALL
SELECT NULL AS region_name, department_name, SUM(salary)
FROM   hr.emp_details_view
GROUP BY department_name
UNION ALL
SELECT NULL AS region_name, NULL AS department_name, SUM(salary)
FROM   hr.emp_details_view
ORDER BY region_name, department_name;

A key indicator that you might want to use a SQL CUBE operation is when obtaining the desired results requires grouping the same data in several different sets. Executing and combining individual GROUP BY statements will use considerably more resources than the equivalent CUBE operation. Since this type of analysis is most commonly done in data warehouses where the count of rows can be enormous, being able to consolidate operations can lead to significant reductions in query time. If the result from a SQL CUBE operation is not specifically what is required, the other advanced aggregation operations ROLLUP and GROUPING SETS may be a better fit. The ROLLUP operation will always produce n+1 groupings when given n expressions. A ROLLUP that is grouped by columns A and B will produce thee groupings (2+1 = 3):

  • Row totals grouped by column A alone
  • Row totals grouped by columns A and B
  • A grand total

The GROUPING SETS operation is more flexible than either ROLLUP or CUBE in that you can choose specifically which sets a query returns. It can be used to generate the same results that either a CUBE or ROLLUP will, or to produce variants that neither can. For example, GROUP BY CUBE( a, b) is equivalent to GROUP BY GROUPING SETS ( (a, b), (a), (b), NULL). The following SQL statement will produce the same results as the example CUBE from earlier:

SELECT region_name, department_name, SUM(salary)
FROM   hr.emp_details_view
GROUP BY GROUPING SETS ((region_name, department_name),

                         (region_name), (department_name),
                         NULL)
ORDER BY region_name, department_name;

When the number of columns included in the GROUP BY is larger, the flexibility of GROUPING SETS becomes more important. A four-column grouping would generate sixteen different sets. If only five of those were required, having the ability to pull just those five is a better option than using the SQL CUBE operation and filtering out the unwanted sets.

Each of the advanced aggregation operations can be useful in the right situation. Understanding the information they return is the first step in adding them to your SQL toolbox. If part of your job involves creating reports to analyze data, sooner or later you will find the perfect place to make use of them.

About the author:
is a database engineer for Computer Sciences Corporation in Orlando, Fla. For over 17 years he has worked with the Oracle Database as a support engineer, database administrator, developer and architect.

This was first published in October 2013
This Content Component encountered an error

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close