SQL analytics: A self-service model

SQL analytics is one of the best-kept secrets of the Oracle database. It represents the latest advancement in the evolution of delivering powerful analytic capabilities to end users. This article reviews a brief history at attempts to deliver analytic capabilities, relates a real-life story of how SQL analytics dramatically improved one analyst's ability to do his job, and demonstrate SQL analytics capabilities via a fictitious call center application.

Introduction

SQL analytics remains one of the best-kept secrets of the Oracle database. It represents the latest advancement in the evolution of delivering powerful analytic capabilities to end users. How quickly the business will benefit is largely in the hands of database teams and their role as educators. This article will review a brief history at attempts to deliver analytic capabilities, relate a real-life story of how SQL analytics dramatically improved one analyst's ability to do his job, and demonstrate SQL analytics capabilities via a fictitious call center application. In the end, I hope you'll agree understanding and promoting SQL analytics is worthwhile to your organization.

A brief history of analytics

The history of delivering analytic capabilities to the business has seen several generations of change, each advancing the ability of business users to transform data into information through sophisticated analysis.

Let's begin our journey with the business's adoption of the computer. I refer to this first generation of analytic delivery as "IT analytics." This IT-centric model is characterized as non-dynamic, complex and removed from the end user. During this time data was primarily stored in hierarchical or network structures and difficult to access, manipulate and view. Analytic delivery involved business users describing their requirements to IT, who in turn coded 3GL programs which spit out hard-copy text-based reports. The end user would come in each day armed with a 600-page report and a calculator and manually analyze their data. Not surprisingly, users wanted direct access to data and tools rather than static dated reports of yesterday's requirements.

In the 1980s relational technology and the personal computer enabled the next generation of analytic delivery -- what I call "desktop analytics." In this fat client model, relational technology provides easy access to data and PCs offer an environment to analyze and graphically present data locally. The challenge of IT moved from writing reports to creating the infrastructure, regulating data flow and providing facilities, tools and training to empower the end user community. A new class of user was born, the "super user." Super users could copy data to this new client work station and create their own reports in an adhoc environment. A need to replace 3GL programs with sophisticated analytic tools was filled by proprietary tools for the PC. Although a great leap forward, this model had its draw backs. Namely, requirements to extract large volumes of data to local PCs often meant off-hour batch operations and increased network and work station capacity to transfer and process the data. End users were paying a high price for the ability to analyze yesterday's data. The start of the "Information Age" proved to be an expensive and complex advancement in analytic delivery.

The end of the 20th century introduced the next generation of analytic delivery, "enterprise analytics." A distributed model moved computing power back to the server and introduced thin clients, often Web-based, for distribution and presentation. This change was delivered as a complete vendor solution. Proprietary tools and data structures did everything from provision data to handle analytic functions to presentation and delivery. For the first time, the enterprise could share a consistent view of analytics. A distributed model seemed to be the answer. However, this model often required re-modeling data in order to accommodate sophisticated analytic queries. Due to licensing and training, this capability remained in the hands of a few super users. Again, data delays, complexity and expensive licensing tagged this as another expensive advancement in analytic deliver.

Recent advancements in the Oracle database have ushered in the current generation of analytic delivery, "SQL analytics." This self-service model requires only a SQL*Plus client and access to the Oracle database for any user to perform sophisticated and timely data analysis. With analytics built right into the database's SQL language, there is less of a requirement to re-model and access data in proprietary engines and data stores. This capability can be performed on existing transactional or dimensional relational data. No longer a domain of super users, SQL analytics enables all users to realize the full potential of information as a corporate resource.

Our journey ends for now with the current generation of analytic delivery achieving many of the goals of its predecessors, a self-service model delivering easy, timely and powerful access to data at no additional cost. Chances are your organization delivers analytic capabilities through a combination of models, each playing an important role in overall information architecture.

True story

One day several years ago I walked by a coworker's desk which had two PCs. A business analyst was working on one while the other was locked with an hour glass. Knowing our strict computer policy, I couldn't help but ask why this user had two PCs. He said he needed a second PC because the first was tied up around 18 hours a day for his analytic work. He went on to say that each day he would start a multi-step process before he went home and cross his fingers the next day when he came in, hoping data had finished downloading to his PC where he would manually complete several more steps using a combination of expensive business intelligence and desktop tools. For now, with the addition of memory and CPU power to his desktop, he was able to complete his work before the next day's cycle was due to start.

Intrigued, I asked if I could take a look at this classic "desktop analytic" problem and offer suggestions for improvement. After getting a quick explanation of what he was doing, I went to work. A short time later I came back with a single SQL statement (less than 10 lines) that was running in under three minutes. I explained Oracle had native analytic capabilities built right into its SQL. It was available to all users without additional costs or licensing. From a SQL*Plus client he could run SQL on the server, never needing to copy data locally. As I left my breathless coworker, feeling pretty good about myself, it occurred to me that I was partially to blame. Although I knew of this powerful solution, I hadn't educated others.

Call center analytics

To help us understand some of the capabilities of SQL analytics, we'll look at a single key performance indicator (KPI), representative occupancy and a few performance metrics for this common call center business function. In our scenario, management is trying to forecast how an expected increase in call volume will impact staffing requirements. The goal is to reduce representative call occupancy (talk time + non-call work time) rather than hire more staff. With two days having passed and corporate wanting a solution implemented by the tenth of the month, management decides to analyze the average call talk and lag times for each representative and forecast a model to achieve this goal. Fortunately, they've been briefed by the database team on Oracle's SQL analytics capabilities and can perform their analysis with a few simple queries.

Introduction

The sample data reflects a row of data for each call. Rows consist of a unique surrogate key call_activity_id, the name of the handling representative and call start and end times. For this demonstration the day and talk time duration_sec of each call is recorded rather than calculated at query time to simplify the non analytical code of the following queries. For our purposes, it doesn't matter if the data being queried is transactional or analytical in nature. The point is, the data is relational and can be analyzed with out-of-the box SQL.

The following table structure and sample output will be the used in the scenario.

create table call_activity
   (call_activity_id       integer       
   ,representative         varchar2 (15) 
   ,start_time             date
   ,end_time               date
   ,day                    integer
   ,duration_sec           integer 
   );


CALL_ACTIVITY_ID REPRESENTATIVE  START_TIME          END_TIME                   DAY DURATION_SEC
---------------- --------------- ------------------- ------------------- ---------- ------------
         1000001 BRITTANY        12/01/2006 07:56:10 12/01/2006 08:00:05          1          235
         1000002 JAIME           12/01/2006 07:58:10 12/01/2006 08:01:05          1          175
         1000003 REBECCA         12/01/2006 07:59:10 12/01/2006 08:02:05          1          175
         1000004 SHANNON         12/01/2006 07:59:30 12/01/2006 08:01:44          1          134
         1000005 BRITTANY        12/01/2006 08:01:10 12/01/2006 08:10:35          1          565
.
.
.

Note: duration_sec = (end_time - start_time) * 86400
      day          = extract(day from start_time)

Average talk time

First, management determines how long representatives spend on the phone. A SQL analytic dense_rank function helps identify each day's top performers. Sub-totals and totals provided by advanced query group by clauses complete data gathering in one query.

select decode(grouping(day),1,'* All Days *',day)                       as "DAY"
      ,decode(grouping(representative),1,'* ALL REPS *',representative) as "REPRESENTATIVE"
      ,round(avg(duration_sec))                                         as "AVG TALK TIME (sec)"
      ,dense_rank () over (partition by day order by avg(duration_sec)) as "RANK" 
  from call_activity
  group by cube (day,representative)
  order by day desc 
          ,rank; 

         
DAY          REPRESENTATIVE  AVG TALK TIME (sec)       RANK
------------ --------------- ------------------- ----------
2            REBECCA                         195          1
2            JAIME                           198          2
2            BRITTANY                        202          3
2            * ALL REPS *                    227          4
2            SHANNON                         370          5
1            REBECCA                         178          1
1            SHANNON                         194          2
1            JAIME                           207          3
1            * ALL REPS *                    219          4
1            BRITTANY                        353          5
* All Days * REBECCA                         186          1
* All Days * JAIME                           203          2
* All Days * * ALL REPS *                    223          3
* All Days * BRITTANY                        256          4
* All Days * SHANNON                         262          5

Elapsed time between calls

Next, they look back to each representative's previous call (row) in order to subtract the end time from the current row's start time. This will give elapsed time between calls. This is accomplished with a SQL analytic lag function. Again, ranking will determine each day's top performers.

select 
     decode(grouping(day),1,'* All Days *',day)                               as "DAY"
    ,decode(grouping(representative),1,'* ALL REPS *',representative) as "REPRESENTATIVE"
    ,round(avg(avg_call_lag_time_sec))                              as "AVG CALL LAG TIME (sec)"
    ,dense_rank () over (partition by day order by avg(avg_call_lag_time_sec)) as "RANK" 
  from (select day
              ,representative 
              ,to_number(start_time - lag(end_time,1) over (partition by day, representative order by 
  call_activity_id)) * 86400 as avg_call_lag_time_sec
          from call_activity
        )
  group by cube (day,representative)   
  order by day desc
          ,rank;

DAY          REPRESENTATIVE  AVG CALL LAG TIME (sec)       RANK
------------ --------------- ----------------------- ----------
2            BRITTANY                             15          1
2            REBECCA                              43          2
2            * ALL REPS *                         53          3
2            JAIME                                68          4
2            SHANNON                             119          5
1            JAIME                                13          1
1            SHANNON                              44          2
1            * ALL REPS *                         50          3
1            REBECCA                              56          4
1            BRITTANY                            125          5
* All Days * JAIME                                39          1
* All Days * REBECCA                              50          2
* All Days * * ALL REPS *                         51          3
* All Days * BRITTANY                             52          4
* All Days * SHANNON                              71          5

Forecast increased call volumes

Management has determined the only opportunity to improve representative occupancy is to reduce talk time. They believe an average talk time under 175 seconds will meet the expected increased call volume. The next step is to forecast how quickly they can achieve this goal. They have identified the following forecast model rules:

  1. Day 1 and 2 top performer averages already exist.
  2. Day 3 average will be the average of day 1 and 2.

    In the evening of day 3, all representatives will participate in a training program.

  3. Day 4 will be a 5% reduction in the average talk time of days 1-3.
  4. Day 5 forward will be a 2% reduction in the average talk time compared to the previous day.

Forecasting will be accomplished with the SQL analytic model clause. New in Oracle 10g, the model clause creates a multidimensional array from query results. Similar to a spreadsheet, array cells called measures, identified by partitions and dimensions, can be manipulated by applying business rules to generate reports.

select day
      ,round(talk_time) as "AVG TALK TIME (sec)"
  from (select day, avg(duration_sec) as talk_time
          from call_activity
          where representative in ('JAIME','REBECCA')
          group by day
       )  
  model
    keep nav 
    dimension by(day)
    measures (talk_time)
    rules sequential order iterate (999) until (talk_time[iteration_number] < 175) 
       (
        talk_time[iteration_number] = 
        case 
        when iteration_number < 3 then talk_time[cv()]
        when iteration_number = 3 then (talk_time[1] + talk_time[2]) / 2
        when iteration_number = 4 then talk_time[3] * .95
        else round(talk_time [cv()-1] *.98)
        end
        )
   order by day;

       DAY AVG TALK TIME (sec)
---------- -------------------
         0
         1                 193
         2                 196
         3                 194
         4                 185
         5                 181
         6                 177
         7                 173

The results of the forecasting model indicate that by using the two top performer's average talk time as a basis and implementing a training program, the goal of reducing average talk time to less than 175 seconds per call can be achieved by the seventh day of the month. This will be welcome news to corporate.

Conclusion

We've seen the evolution of delivering analytics progress through several generations. Each generation has contributed to the ability of users to perform sophisticated analysis. The latest advancement, SQL analytics, achieves a self-service model delivering powerful built-in capabilities that are free and easy to use. Now that you know, educate yourself further, and tell a friend!

About the author

Jeff McCormick is a Senior Data Architect at a financial services company and Executive Director of the Connecticut Oracle User Group. Jeff has worked in IT for almost 20 years as a system, storage and database architect/administrator and has over 15 years of experience with DB2, Sybase, SQL Server and Oracle relational database technology. He holds several certifications including Oracle Database 10g Administrator Certified Professional, Microsoft Certified Product (SQL Server) and Certified Sybase Professional Database Administrator. Jeff has performed extensive work in the area of high availability and disaster recovery, speaking and authoring several papers on availability architecture.
 

This was first published in March 2007

Dig deeper on Oracle and SQL

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close