Tip

Tip: Exposing logical attributes with virtual columns

Data modelers are often faced with the dilemma of how to model logical attributes in their data warehouse schema design. A new feature in Oracle 11g called virtual columns solves this dilemma by exposing logical attributes while reducing object management and storage consumption. Additionally, virtual columns can be used to improve availability, manageability and performance of the database. Virtual columns assist modelers in delivering simple, intuitive and fast performing designs.

Logical attributes

Logical attributes are commonly referred to as expression or function based results. These attributes fall into two categories: calculated and derived values. Calculated values represent the results of mathematical formulas. The following example adds two columns, resulting in a calculated value:

 

 
base salary + commission = total salary

Derived values, on the other hand, are logic based. The same two columns added together can be evaluated, resulting in a derived value:

 

 
if base salary + commission > 100000 then 'Highly Compensated' end

The dilemma

Prior to 11g, data modelers either had to expose logical attributes as physical table columns or virtualize the values in views. Both solutions were

    Requires Free Membership to View

less than optimal.

Storing logical attributes as physical table columns allowed values to be exposed to users and were indexable. As physical columns, accurate statistics could be collected resulting in better decisions by the optimizer. However, storing the attributes as table columns meant consuming disk space. Conscientious of exponential data growth within the data warehouse, many designers opted to model these attributes in views.

Views achieved the design goal of exposing the logical attributes without consuming physical disk space. A view can be thought of as a stored query with results materialized at SQL execution. For this reason, optimizer statistics cannot be pre-collected on views. The optimizer is left with no other option but to guess at the cost and cardinality of the expression results. Although view columns cannot be indexed or constrained, the introduction of Function Based Indexes (FBIs) in Oracle 8i provided the optimizer with actual statistics for query plan optimization. Views and FBIs seemed a viable solution but meant managing additional database objects which led to data warehouse complexity.

The 11g answer

The introduction of virtual columns in Oracle 11g has provided data modelers with an optimal design for logical attributes. Exposed as table columns, virtual column values are not actually stored on disk but are computed on the fly during SQL execution. They actually share much of the same code path as FBIs but offer visibility. In addition to not being physically stored, virtual columns have other significant advantages over views. Optimizer statistics can be collected on virtual columns and these columns can be indexed, constrained and even used to partition tables. This enables the optimizer to make fact-based intelligent decisions around query access. Virtual columns achieve the design goal of exposing logical attributes without additional storage or object costs.

Show me

Let's demonstrate the use of virtual columns in a fictitious data warehouse design.

1. Create a de-normalized PRODUCT table that you might expect to see in a dimensional Star schema design.

 

 
 create table product (product_key number(6) not null ,description varchar2(1000) not null ,cost number (7,0) not null ,markup_percent number (3,0) not null  ,total_cost number (7,0) as (cost * (1 + (markup_percent / 100))) virtual not null check (total_cost < 125001)  ,states number not null ,az_flag varchar2(1) as (decode(bitand (states,1),1,'Y','N')) virtual not null ,ct_flag varchar2(1) as (decode(bitand (states,2),2,'Y','N')) virtual not null ,ri_flag varchar2(1) as (decode(bitand (states,4),4,'Y','N')) virtual not null ) partition by range (total_cost) ( partition low_cost_partition values less than (1000) ,partition moderate_cost_partition values less than (10000) ,partition high_cost_partition values less than (MAXVALUE) );

Notice the new syntax of the virtual columns in bold print. The calculated attribute "total_cost" has inline NULL and check constraints, and is used as the partitioning column. Each derived state flag attribute is based on the "states" column which reflects the aggregation of all states where the product is sold (see Figure 1). For example, a value of 7 in the "states" column reflects a product sold in ARIZONA, CONNECTICUT and RHODE ISLAND (1 + 2 + 4).

 

Figure 1 State Assignment
 
 STATE STATE NUMBER ------------ ------------ ARIZONA 1 CONNECTICUT 2 RHODE ISLAND 4

2. Show virtual column expressions are actually stored as metadata in the data dictionary.

 

 
 select column_name, virtual_column, data_default from user_tab_cols where table_name = 'PRODUCT' order by column_id; COLUMN_NAME VIR DATA_DEFAULT ------------------------------ --- ------------------------------------ PRODUCT_KEY NO DESCRIPTION NO COST NO MARKUP_PERCENT NO TOTAL_COST YES "COST"*(1+"MARKUP_PERCENT"/100) STATES NO AZ_FLAG YES DECODE(BITAND("STATES",1),1,'Y','N') CT_FLAG YES DECODE(BITAND("STATES",2),2,'Y','N') RI_FLAG YES DECODE(BITAND("STATES",4),4,'Y','N')

3. Load the PRODUCT table.

 

 
 insert /*+ APPEND */ into product (PRODUCT_KEY,DESCRIPTION,COST,MARKUP_PERCENT,STATES) select rownum ,rpad(to_char(rownum),1000) ,case floor(dbms_random.value(1,6)) when 1 then 10 when 2 then 100 when 3 then 1000 when 4 then 10000 when 5 then 100000 end ,case floor(dbms_random.value(1,4)) when 1 then 5 when 2 then 10 when 3 then 25 end ,case floor(dbms_random.value(1,3)) when 1 then 1 when 2 then 3 end from dba_objects where rownum < 50001; commit;

Storing the odd numbers 1 & 3 in the "states" column will ensure all products are sold in ARIZONA and no products are sold in RHODE ISLAND. We will come back to this in later steps.

4. Create local partitioned indexes on the virtual "state flag" columns.

 

 

 
 create bitmap index product_az_ix on product (az_flag) local; create bitmap index product_ct_ix on product (ct_flag) local; create bitmap index product_ri_ix on product (ri_flag) local; select a.index_name, a.index_Type, b.column_Expression from user_indexes a ,user_ind_expressions b where a.index_name = b.index_name and a.index_name like 'PRODUCT_%_IX'; INDEX_NAME INDEX_TYPE COLUMN_EXPRESSION ------------- --------------------- ------------------------------------ PRODUCT_AZ_IX FUNCTION-BASED BITMAP DECODE(BITAND("STATES",1),1,'Y','N') PRODUCT_CT_IX FUNCTION-BASED BITMAP DECODE(BITAND("STATES",2),2,'Y','N') PRODUCT_RI_IX FUNCTION-BASED BITMAP DECODE(BITAND("STATES",4),4,'Y','N')

5. Gather Statistics.

 

 
 execute dbms_stats.gather_table_stats(user,'PRODUCT' ,cascade=>true, estimate_percent=> 100,
granularity=> 'PARTITION'); select blocks, num_rows, avg_row_len from user_tables where table_name = 'PRODUCT'; BLOCKS NUM_ROWS AVG_ROW_LEN ---------- ---------- ----------- 7284 50000 1024

6. Verify the skewed distribution for ARIZONA and RHODE ISLAND.

 

 

 
 select states,count(*) from product group by states order by states; STATES COUNT(*) ---------- ---------- 1 24910 3 25090 select ' ARIZONA' ,sum(decode(az_flag,'Y',1,0)) "AZ Yes" ,sum(decode(az_flag,'N',1,0)) "AZ No" ,' RHODE ISLAND',sum(decode(ri_flag,'Y',1,0)) "RI Yes" ,sum(decode(ri_flag,'N',1,0)) "RI No" from product; 'ARIZONA AZ Yes AZ No 'RHODEISLAND' RI Yes RI No -------- ---------- ---------- ------------- ---------- ---------- ARIZONA 50000 0 RHODE ISLAND 0 50000

7. Compare ARIZONA and RHODE ISLAND Optimizer Query Plans.

Here, with statistics collection, the skewed results of the column functions are known by the optimizer. This causes the optimizer to treat the cost of each query differently, resulting in different query plans. Notice the number of estimated rows match the actual number of rows processed.

 

 
 select /*+ gather_plan_statistics */ product_key from product where az_flag = 'Y'; PRODUCT_KEY ----------- 1 . . . 49999 50000 rows selected. select * from table(dbms_xplan.display_cursor(null,null,'allstats cost last')); --------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads | --------------------------------------------------------------------------------------------------------------- | 1 | PARTITION RANGE ALL| | 1 | 50000 | 1979 (1)| 50000 |00:00:00.61 | 10011 | 7147 | |* 2 | TABLE ACCESS FULL | PRODUCT | 3 | 50000 | 1979 (1)| 50000 |00:00:00.46 | 10011 | 7147 | --------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ---------------------------------------------------  2 - filter(DECODE(BITAND("STATES",1),1,'Y','N')='Y') select /*+ gather_plan_statistics */ product_key from product where ri_flag = 'Y'; no rows selected select * from table(dbms_xplan.display_cursor(null,null,'allstats cost last')); --------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------------------------------- | 1 | PARTITION RANGE ALL | | 1 |  1 | 4 (0)| 0 |00:00:00.01 | 5 | | 2 | TABLE ACCESS BY LOCAL INDEX ROWID| PRODUCT | 3 |  1 | 4 (0)| 0 |00:00:00.01 | 5 | | 3 | BITMAP CONVERSION TO ROWIDS | | 3 | | |  0 |00:00:00.01 | 5 | |* 4 | BITMAP INDEX SINGLE VALUE | PRODUCT_RI_IX | 3 | | |  0 |00:00:00.01 | 5 | --------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ---------------------------------------------------  4 - access("RI_FLAG"='Y')

Here, we see the optimizer has performed partition pruning on the "total_cost" virtual column.

 

 
 select count(*) from product where total_cost > 10000; COUNT(*) ---------- 19885 select * from table(dbms_xplan.display_cursor); --------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | --------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 788 (100)| | | | | 1 | SORT AGGREGATE | | 1 | 13 | | | | | | 2 |  PARTITION RANGE SINGLE| | 19885 | 252K| 788 (1)| 00:00:10 |  3 |  3 | |* 3 | TABLE ACCESS FULL | PRODUCT | 19885 | 252K| 788 (1)| 00:00:10 |  3 |  3 | --------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ---------------------------------------------------  3 - filter("COST"*(1+"MARKUP_PERCENT"/100)>10000)

Virtual columns are a useful new feature in Oracle 11g. This functionality offers data modelers an optimal solution for exposing logical attributes. Understanding how to use features of the database is essential to delivering a data warehouse design that is both fast and simple.

About the author

 

Jeff McCormick is an Architecture Director at a major health care company and President of the Connecticut Oracle User Group. Jeff has worked in IT for 20 years as a system, storage and database architect/administrator and has over 15 years of experience with DB2, Sybase, SQL Server and Oracle relational database technology. He holds several certifications including Oracle Database 10g Administrator Certified Professional, Microsoft Certified Product (SQL Server) and Certified Sybase Professional Database Administrator. Jeff is a frequent speaker and has authored several papers on data and database architecture.

This was first published in May 2008

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.