Home > Oracle Tips > Data Warehousing & Business Intelligence > Tip: Exposing logical attributes with virtual columns
Oracle Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DATA WAREHOUSING & BUSINESS INTELLIGENCE

Tip: Exposing logical attributes with virtual columns


Jeffrey R. McCormick
05.08.2008
Rating: -4.75- (out of 5)


Oracle tips, scripts, and expert advice
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


Data modelers are often faced with the dilemma of how to model logical attributes in their data warehouse schema design. A new feature in Oracle 11g called virtual columns solves this dilemma by exposing logical attributes while reducing object management and storage consumption. Additionally, virtual columns can be used to improve availability, manageability and performance of the database. Virtual columns assist modelers in delivering simple, intuitive and fast performing designs.

Logical attributes

Logical attributes are commonly referred to as expression or function based results. These attributes fall into two categories: calculated and derived values. Calculated values represent the results of mathematical formulas. The following example adds two columns, resulting in a calculated value:

base salary + commission = total salary

Derived values, on the other hand, are logic based. The same two columns added together can be evaluated, resulting in a derived value:

if base salary + commission > 100000 then 'Highly Compensated' end

The dilemma

Prior to 11g, data modelers either had to expose logical attributes as physical table columns or virtualize the values in views. Both solutions were less than optimal.

Storing logical attributes as physical table columns allowed values to be exposed to users and were indexable. As physical columns, accurate statistics could be collected resulting in better decisions by the optimizer. However, storing the attributes as table columns meant consuming disk space. Conscientious of exponential data growth within the data warehouse, many designers opted to model these attributes in views.

Views achieved the design goal of exposing the logical attributes without consuming physical disk space. A view can be thought of as a stored query with results materialized at SQL execution. For this reason, optimizer statistics cannot be pre-collected on views. The optimizer is left with no other option but to guess at the cost and cardinality of the expression results. Although view columns cannot be indexed or constrained, the introduction of Function Based Indexes (FBIs) in Oracle 8i provided the optimizer with actual statistics for query plan optimization. Views and FBIs seemed a viable solution but meant managing additional database objects which led to data warehouse complexity.

The 11g answer

The introduction of virtual columns in Oracle 11g has provided data modelers with an optimal design for logical attributes. Exposed as table columns, virtual column values are not actually stored on disk but are computed on the fly during SQL execution. They actually share much of the same code path as FBIs but offer visibility. In addition to not being physically stored, virtual columns have other significant advantages over views. Optimizer statistics can be collected on virtual columns and these columns can be indexed, constrained and even used to partition tables. This enables the optimizer to make fact-based intelligent decisions around query access. Virtual columns achieve the design goal of exposing logical attributes without additional storage or object costs.

Show me

Let's demonstrate the use of virtual columns in a fictitious data warehouse design.

1. Create a de-normalized PRODUCT table that you might expect to see in a dimensional Star schema design.

create table product
   (product_key    number(6)      not null
   ,description    varchar2(1000) not null
   ,cost           number (7,0)   not null
   ,markup_percent number (3,0)   not null 
  ,total_cost     number (7,0)   as (cost * (1 + (markup_percent / 100))) virtual not null
                                  check (total_cost < 125001)      
   ,states         number         not null
   ,az_flag        varchar2(1)    as (decode(bitand (states,1),1,'Y','N')) virtual not null
   ,ct_flag        varchar2(1)    as (decode(bitand (states,2),2,'Y','N')) virtual not null
   ,ri_flag        varchar2(1)    as (decode(bitand (states,4),4,'Y','N')) virtual not null          
   )
   partition by range (total_cost)
     (
      partition low_cost_partition      values less than (1000)
     ,partition moderate_cost_partition values less than (10000)
     ,partition high_cost_partition     values less than (MAXVALUE)
     );

Notice the new syntax of the virtual columns in bold print. The calculated attribute "total_cost" has inline NULL and check constraints, and is used as the partitioning column. Each derived state flag attribute is based on the "states" column which reflects the aggregation of all states where the product is sold (see Figure 1). For example, a value of 7 in the "states" column reflects a product sold in ARIZONA, CONNECTICUT and RHODE ISLAND (1 + 2 + 4).

Figure 1 State Assignment
                                     
STATE           STATE NUMBER
------------    ------------
ARIZONA                    1          
CONNECTICUT                2        
RHODE ISLAND               4 

2. Show virtual column expressions are actually stored as metadata in the data dictionary.

select column_name, virtual_column, data_default 
  from user_tab_cols 
  where table_name = 'PRODUCT'
  order by column_id;

COLUMN_NAME                    VIR DATA_DEFAULT
------------------------------ --- ------------------------------------
PRODUCT_KEY                    NO
DESCRIPTION                    NO
COST                           NO
MARKUP_PERCENT                 NO
TOTAL_COST                     YES "COST"*(1+"MARKUP_PERCENT"/100)
STATES                         NO
AZ_FLAG                        YES DECODE(BITAND("STATES",1),1,'Y','N')
CT_FLAG                        YES DECODE(BITAND("STATES",2),2,'Y','N')
RI_FLAG                        YES DECODE(BITAND("STATES",4),4,'Y','N')

3. Load the PRODUCT table.

insert /*+ APPEND */ into product
  (PRODUCT_KEY,DESCRIPTION,COST,MARKUP_PERCENT,STATES)  
  select rownum
        ,rpad(to_char(rownum),1000)
        ,case floor(dbms_random.value(1,6))  
           when  1 then  10
           when  2 then  100
           when  3 then  1000
           when  4 then  10000
           when  5 then  100000
         end  
        ,case floor(dbms_random.value(1,4))  
           when  1 then  5
           when  2 then  10
           when  3 then  25
         end          
        ,case floor(dbms_random.value(1,3))  
           when  1 then  1
           when  2 then  3
         end  
    from  dba_objects 
    where rownum < 50001;
commit;

Storing the odd numbers 1 & 3 in the "states" column will ensure all products are sold in ARIZONA and no products are sold in RHODE ISLAND. We will come back to this in later steps.

4. Create local partitioned indexes on the virtual "state flag" columns.

create bitmap index product_az_ix on product (az_flag) local;
create bitmap index product_ct_ix on product (ct_flag) local;
create bitmap index product_ri_ix on product (ri_flag) local;


select a.index_name, a.index_Type, b.column_Expression
  from user_indexes         a
      ,user_ind_expressions b
  where a.index_name    = b.index_name 
    and a.index_name like 'PRODUCT_%_IX';

INDEX_NAME    INDEX_TYPE            COLUMN_EXPRESSION
------------- --------------------- ------------------------------------
PRODUCT_AZ_IX FUNCTION-BASED BITMAP DECODE(BITAND("STATES",1),1,'Y','N')
PRODUCT_CT_IX FUNCTION-BASED BITMAP DECODE(BITAND("STATES",2),2,'Y','N')
PRODUCT_RI_IX FUNCTION-BASED BITMAP DECODE(BITAND("STATES",4),4,'Y','N')

5. Gather Statistics.

execute dbms_stats.gather_table_stats(user,'PRODUCT' ,cascade=>true, estimate_percent=> 100, 
granularity=> 'PARTITION'); select blocks, num_rows, avg_row_len from user_tables where table_name = 'PRODUCT'; BLOCKS NUM_ROWS AVG_ROW_LEN ---------- ---------- ----------- 7284 50000 1024

6. Verify the skewed distribution for ARIZONA and RHODE ISLAND.

select states,count(*) from product group by states order by states;

    STATES   COUNT(*)
---------- ----------
         1      24910
         3      25090

select ' ARIZONA'     ,sum(decode(az_flag,'Y',1,0)) "AZ Yes"
                      ,sum(decode(az_flag,'N',1,0)) "AZ No"
      ,' RHODE ISLAND',sum(decode(ri_flag,'Y',1,0)) "RI Yes"
                      ,sum(decode(ri_flag,'N',1,0)) "RI No"
  from product;

 'ARIZONA     AZ Yes      AZ No 'RHODEISLAND'     RI Yes      RI No
 -------- ---------- ---------- ------------- ---------- ----------
  ARIZONA      50000          0  RHODE ISLAND          0      50000

7. Compare ARIZONA and RHODE ISLAND Optimizer Query Plans.

Here, with statistics collection, the skewed results of the column functions are known by the optimizer. This causes the optimizer to treat the cost of each query differently, resulting in different query plans. Notice the number of estimated rows match the actual number of rows processed.

select /*+ gather_plan_statistics */ product_key from product where az_flag = 'Y';

PRODUCT_KEY
-----------
          1
.
.
.
      49999

50000 rows selected.

select * from table(dbms_xplan.display_cursor(null,null,'allstats cost last'));

---------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------
|   1 |  PARTITION RANGE ALL|         |      1 |  50000 |  1979   (1)|  50000 |00:00:00.61 |   10011 |   7147 |
|*  2 |   TABLE ACCESS FULL | PRODUCT |      3 |  50000 |  1979   (1)|  50000 |00:00:00.46 |   10011 |   7147 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(DECODE(BITAND("STATES",1),1,'Y','N')='Y')


select /*+ gather_plan_statistics */ product_key from product where ri_flag = 'Y';

no rows selected

select * from table(dbms_xplan.display_cursor(null,null,'allstats cost last'));


---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name          | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------------
|   1 |  PARTITION RANGE ALL               |               |      1 |      1 |     4   (0)|      0 |00:00:00.01 |       5 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| PRODUCT       |      3 |      1 |     4   (0)|      0 |00:00:00.01 |       5 |
|   3 |    BITMAP CONVERSION TO ROWIDS     |               |      3 |        |            |      0 |00:00:00.01 |       5 |
|*  4 |     BITMAP INDEX SINGLE VALUE      | PRODUCT_RI_IX |      3 |        |            |      0 |00:00:00.01 |       5 |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("RI_FLAG"='Y')

Here, we see the optimizer has performed partition pruning on the "total_cost" virtual column.

select count(*) from product where total_cost > 10000;

  COUNT(*)
----------
     19885

select * from table(dbms_xplan.display_cursor);

---------------------------------------------------------------------------------------------------
| Id  | Operation               | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |         |       |       |   788 (100)|          |       |       |
|   1 |  SORT AGGREGATE         |         |     1 |    13 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|         | 19885 |   252K|   788   (1)| 00:00:10 |     3 |     3 |
|*  3 |    TABLE ACCESS FULL    | PRODUCT | 19885 |   252K|   788   (1)| 00:00:10 |     3 |     3 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("COST"*(1+"MARKUP_PERCENT"/100)>10000)

Virtual columns are a useful new feature in Oracle 11g. This functionality offers data modelers an optimal solution for exposing logical attributes. Understanding how to use features of the database is essential to delivering a data warehouse design that is both fast and simple.

About the author

Jeff McCormick is an Architecture Director at a major health care company and President of the Connecticut Oracle User Group. Jeff has worked in IT for 20 years as a system, storage and database architect/administrator and has over 15 years of experience with DB2, Sybase, SQL Server and Oracle relational database technology. He holds several certifications including Oracle Database 10g Administrator Certified Professional, Microsoft Certified Product (SQL Server) and Certified Sybase Professional Database Administrator. Jeff is a frequent speaker and has authored several papers on data and database architecture.

Rate this Tip
To rate tips, you must be a member of SearchOracle.com.
Register now to start rating these tips. Log in if you are already a member.




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


RELATED CONTENT
Data Warehousing & Business Intelligence
Top five data management buzzwords
A bit wiser with Oracle technology
RFID: Metadata in motion
Oracle vs. SAP
Core principles of data warehouse design
Learning Guide: Data warehousing and business intelligence
Saving lives with Oracle
Do people really understand data warehousing?
The BI application consolidation challenge
MetaBase scripting for the Oracle data warehousing DBA

Oracle database administration (DBA)
Why am I getting an Oracle error when issuing the RECOVER DATABASE command?
Can I add multiple datafiles and make use of all the temporary datafiles at the same time?
Checkpoint vs. commit process
What's the best tape backup strategy for 10g RAC?
Can I resize the SYSTEM tablespace?
How does an INSERT/UPDATE statement behave when executed compared to a SELECT statement?
How to recover a control file
The USING BACKUP CONTROLFILE clause in Oracle commands
Backup and recovery of Oracle 10g with forgotten admin password
How to use the OPatch Utility in Oracle
Oracle database administration (DBA) Research

Oracle data warehousing
Oracle users prepare for MDM
Data modeling tools no substitute for hard work
eHarmony spurns Microsoft, finds match with Oracle 10g
Separate database for each client in Oracle 10g
Help with data warehouse disaster recovery planning
ORA-600 error and slow server response after upgrade
Extracting data from a specific period
How many disk groups for our data marts?
Data warehouse based on operational data model
RAC node scaling in DSS workloads vs. OLTP

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
multidimensional database  (SearchOracle.com)
operational data store  (SearchOracle.com)
repository  (SearchOracle.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.

HomeNewsTopicsTipsAsk the ExpertsWebcastsWhite PapersProductsBlogs
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2003 - 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts