
DATA WAREHOUSING & BUSINESS INTELLIGENCE
Tip: Exposing logical attributes with virtual columns
Jeffrey R. McCormick 05.08.2008
Rating: -4.75- (out of 5)




|
Data modelers are often faced with the dilemma of how to model logical attributes in their data warehouse schema design. A new feature in Oracle 11g called virtual columns solves this dilemma by exposing logical attributes while reducing object management and storage consumption. Additionally, virtual columns can be used to improve availability, manageability and performance of the database. Virtual columns assist modelers in delivering simple, intuitive and fast performing designs.
Logical attributes
Logical attributes are commonly referred to as expression or function based results. These attributes fall into two categories: calculated and derived values. Calculated values represent the results of mathematical formulas. The following example adds two columns, resulting in a calculated value:
| base salary + commission = total salary |
Derived values, on the other hand, are logic based. The same two columns added together can be evaluated, resulting in a derived value:
| if base salary + commission > 100000 then 'Highly Compensated' end |
The dilemma
Prior to 11g, data modelers either had to expose logical attributes as physical table columns or virtualize the values in views. Both solutions were less than optimal.
Storing logical attributes as physical table columns allowed values to be exposed to users and were indexable. As physical columns, accurate statistics could be collected resulting in better decisions by the optimizer. However, storing the attributes as table columns meant consuming disk space. Conscientious of exponential data growth within the data warehouse, many designers opted to model these attributes in views.
Views achieved the design goal of exposing the logical attributes without consuming physical disk space. A view can be thought of as a stored query with results materialized at SQL execution. For this reason, optimizer statistics cannot be pre-collected on views. The optimizer is left with no other option but to guess at the cost and cardinality of the expression results. Although view columns cannot be indexed or constrained, the introduction of Function Based Indexes (FBIs) in Oracle 8i provided the optimizer with actual statistics for query plan optimization. Views and FBIs seemed a viable solution but meant managing additional database objects which led to data warehouse complexity.
The 11g answer
The introduction of virtual columns in Oracle 11g has provided data modelers with an optimal design for logical attributes. Exposed as table columns, virtual column values are not actually stored on disk but are computed on the fly during SQL execution. They actually share much of the same code path as FBIs but offer visibility. In addition to not being physically stored, virtual columns have other significant advantages over views. Optimizer statistics can be collected on virtual columns and these columns can be indexed, constrained and even used to partition tables. This enables the optimizer to make fact-based intelligent decisions around query access. Virtual columns achieve the design goal of exposing logical attributes without additional storage or object costs.
Show me
Let's demonstrate the use of virtual columns in a fictitious data warehouse design.
1. Create a de-normalized PRODUCT table that you might expect to see in a dimensional Star schema design.
create table product
(product_key number(6) not null
,description varchar2(1000) not null
,cost number (7,0) not null
,markup_percent number (3,0) not null
,total_cost number (7,0) as (cost * (1 + (markup_percent / 100))) virtual not null
check (total_cost < 125001)
,states number not null
,az_flag varchar2(1) as (decode(bitand (states,1),1,'Y','N')) virtual not null
,ct_flag varchar2(1) as (decode(bitand (states,2),2,'Y','N')) virtual not null
,ri_flag varchar2(1) as (decode(bitand (states,4),4,'Y','N')) virtual not null
)
partition by range (total_cost)
(
partition low_cost_partition values less than (1000)
,partition moderate_cost_partition values less than (10000)
,partition high_cost_partition values less than (MAXVALUE)
);
|
Notice the new syntax of the virtual columns in bold print. The calculated attribute "total_cost" has inline NULL and check constraints, and is used as the partitioning column. Each derived state flag attribute is based on the "states" column which reflects the aggregation of all states where the product is sold (see Figure 1). For example, a value of 7 in the "states" column reflects a product sold in ARIZONA, CONNECTICUT and RHODE ISLAND (1 + 2 + 4).
| Figure 1 State Assignment |
STATE STATE NUMBER
------------ ------------
ARIZONA 1
CONNECTICUT 2
RHODE ISLAND 4
|
2. Show virtual column expressions are actually stored as metadata in the data dictionary.
select column_name, virtual_column, data_default
from user_tab_cols
where table_name = 'PRODUCT'
order by column_id;
COLUMN_NAME VIR DATA_DEFAULT
------------------------------ --- ------------------------------------
PRODUCT_KEY NO
DESCRIPTION NO
COST NO
MARKUP_PERCENT NO
TOTAL_COST YES "COST"*(1+"MARKUP_PERCENT"/100)
STATES NO
AZ_FLAG YES DECODE(BITAND("STATES",1),1,'Y','N')
CT_FLAG YES DECODE(BITAND("STATES",2),2,'Y','N')
RI_FLAG YES DECODE(BITAND("STATES",4),4,'Y','N')
|
3. Load the PRODUCT table.
insert /*+ APPEND */ into product
(PRODUCT_KEY,DESCRIPTION,COST,MARKUP_PERCENT,STATES)
select rownum
,rpad(to_char(rownum),1000)
,case floor(dbms_random.value(1,6))
when 1 then 10
when 2 then 100
when 3 then 1000
when 4 then 10000
when 5 then 100000
end
,case floor(dbms_random.value(1,4))
when 1 then 5
when 2 then 10
when 3 then 25
end
,case floor(dbms_random.value(1,3))
when 1 then 1
when 2 then 3
end
from dba_objects
where rownum < 50001;
commit;
|
Storing the odd numbers 1 & 3 in the "states" column will ensure all products are sold in ARIZONA and no products are sold in RHODE ISLAND. We will come back to this in later steps.
4. Create local partitioned indexes on the virtual "state flag" columns.
create bitmap index product_az_ix on product (az_flag) local;
create bitmap index product_ct_ix on product (ct_flag) local;
create bitmap index product_ri_ix on product (ri_flag) local;
select a.index_name, a.index_Type, b.column_Expression
from user_indexes a
,user_ind_expressions b
where a.index_name = b.index_name
and a.index_name like 'PRODUCT_%_IX';
INDEX_NAME INDEX_TYPE COLUMN_EXPRESSION
------------- --------------------- ------------------------------------
PRODUCT_AZ_IX FUNCTION-BASED BITMAP DECODE(BITAND("STATES",1),1,'Y','N')
PRODUCT_CT_IX FUNCTION-BASED BITMAP DECODE(BITAND("STATES",2),2,'Y','N')
PRODUCT_RI_IX FUNCTION-BASED BITMAP DECODE(BITAND("STATES",4),4,'Y','N')
|
5. Gather Statistics.
execute dbms_stats.gather_table_stats(user,'PRODUCT' ,cascade=>true, estimate_percent=> 100, granularity=> 'PARTITION');
select blocks, num_rows, avg_row_len from user_tables where table_name = 'PRODUCT';
BLOCKS NUM_ROWS AVG_ROW_LEN
---------- ---------- -----------
7284 50000 1024
|
6. Verify the skewed distribution for ARIZONA and RHODE ISLAND.
select states,count(*) from product group by states order by states;
STATES COUNT(*)
---------- ----------
1 24910
3 25090
select ' ARIZONA' ,sum(decode(az_flag,'Y',1,0)) "AZ Yes"
,sum(decode(az_flag,'N',1,0)) "AZ No"
,' RHODE ISLAND',sum(decode(ri_flag,'Y',1,0)) "RI Yes"
,sum(decode(ri_flag,'N',1,0)) "RI No"
from product;
'ARIZONA AZ Yes AZ No 'RHODEISLAND' RI Yes RI No
-------- ---------- ---------- ------------- ---------- ----------
ARIZONA 50000 0 RHODE ISLAND 0 50000
|
7. Compare ARIZONA and RHODE ISLAND Optimizer Query Plans.
Here, with statistics collection, the skewed results of the column functions are known by the optimizer. This causes the optimizer to treat the cost of each query differently, resulting in different query plans. Notice the number of estimated rows match the actual number of rows processed.
select /*+ gather_plan_statistics */ product_key from product where az_flag = 'Y';
PRODUCT_KEY
-----------
1
.
.
.
49999
50000 rows selected.
select * from table(dbms_xplan.display_cursor(null,null,'allstats cost last'));
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------
| 1 | PARTITION RANGE ALL| | 1 | 50000 | 1979 (1)| 50000 |00:00:00.61 | 10011 | 7147 |
|* 2 | TABLE ACCESS FULL | PRODUCT | 3 | 50000 | 1979 (1)| 50000 |00:00:00.46 | 10011 | 7147 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(DECODE(BITAND("STATES",1),1,'Y','N')='Y')
select /*+ gather_plan_statistics */ product_key from product where ri_flag = 'Y';
no rows selected
select * from table(dbms_xplan.display_cursor(null,null,'allstats cost last'));
---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------------
| 1 | PARTITION RANGE ALL | | 1 | 1 | 4 (0)| 0 |00:00:00.01 | 5 |
| 2 | TABLE ACCESS BY LOCAL INDEX ROWID| PRODUCT | 3 | 1 | 4 (0)| 0 |00:00:00.01 | 5 |
| 3 | BITMAP CONVERSION TO ROWIDS | | 3 | | | 0 |00:00:00.01 | 5 |
|* 4 | BITMAP INDEX SINGLE VALUE | PRODUCT_RI_IX | 3 | | | 0 |00:00:00.01 | 5 |
---------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("RI_FLAG"='Y')
|
Here, we see the optimizer has performed partition pruning on the "total_cost" virtual column.
select count(*) from product where total_cost > 10000;
COUNT(*)
----------
19885
select * from table(dbms_xplan.display_cursor);
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 788 (100)| | | |
| 1 | SORT AGGREGATE | | 1 | 13 | | | | |
| 2 | PARTITION RANGE SINGLE| | 19885 | 252K| 788 (1)| 00:00:10 | 3 | 3 |
|* 3 | TABLE ACCESS FULL | PRODUCT | 19885 | 252K| 788 (1)| 00:00:10 | 3 | 3 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("COST"*(1+"MARKUP_PERCENT"/100)>10000)
|
Virtual columns are a useful new feature in Oracle 11g. This functionality offers data modelers an optimal solution for exposing logical attributes. Understanding how to use features of the database is essential to delivering a data warehouse design that is both fast and simple.
About the author
Jeff McCormick is an Architecture Director at a major health care company and President of the Connecticut Oracle User Group. Jeff has worked in IT for 20 years as a system, storage and database architect/administrator and has over 15 years of experience with DB2, Sybase, SQL Server and Oracle relational database technology. He holds several certifications including Oracle Database 10g Administrator Certified Professional, Microsoft Certified Product (SQL Server) and Certified Sybase Professional Database Administrator. Jeff is a frequent speaker and has authored several papers on data and database architecture.
 |

|
Rate this Tip
|
To rate tips, you must be a member of SearchOracle.com. Register now
to start rating these tips. Log in if you are already a member.
|


');
// -->
DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.
|
 |
 |
|
 |
 |
 |
 |
| TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of . |
|
| | |
All Rights Reserved, , TechTarget |
|
|
|
|
| |