Being able to design and produce electronic reports automatically has been an important software success for today's businesses, enabling them to streamline their information gathering and distribution processes.
While the need for quick and easy access to information in data warehouses is important, it is crucial to be able to take the next step and analyze that data for use in real-time strategic business decisions.
Typical discussions about a data warehouse return on investment (ROI) evolve from focusing on final project cost vs. time saved in IT labor, to addressing how to save time and make better decisions, to quantifying business decisions more accurately with an OLAP (Online Analytical Processing) system. The end result is that organizations with an existing data mart and an OLAP system in place will go on to conduct "advanced analytics" once the data warehouse has been built.
OLAP and business intelligence (BI) are key elements of reporting and offer tremendous value to businesses. Without these systems available, many companies may head toward reduced profitability. While some would say that OLAP applications are commodities that can be trivialized as "count dashboards," others would argue that OLAP is a critical function that cannot be replaced -- even by advanced analytics. OLAP and advanced analytics are complementary and serve the same purpose: to use the data warehouse to increase ROI.
OLAP is often confused with advanced analytics because both are forms of analysis, but "advanced analytics" is a broad marketing term that means different things to different people, even within the software vendor community. For the purpose of this article, data analysis is defined as:
- Conducting mathematical operations on data to support or define a business insight (hypothesis).
- Cross-tab reports, graphs, associations and clustering are examples of exploratory analysis -- what you need to do when looking for a business insight. Inferential statistics and time series are examples of explanatory analysis -- what is needed when supporting a business insight.
Advanced analytics is commonly used as a "catch all" category for techniques of statistical, machine learning or mathematical roots. This includes such activities as descriptive modeling, predictive modeling, structural modeling, forecasting, quality assurance and optimization (whereas data mining concerns descriptive and predictive modeling alone). Figure 1 shows a landscape of the analytics space.
The component terms listed above are described in more detail below:
- Association Output (aka, Market Basket Analysis, Affinity Analysis): the conditional probability of items occurring together in the data. While we can use a SQL query and sub-query to calculate the frequency of a given paired relationship (shampoo and conditioner), the query becomes more complex when we open the analysis to three-way or four-way relationships. Furthermore, association assumes no hypothesis; we do not need to know in advanced what products we are analyzing or limit ourselves to two-way relationships. All combinations will be calculated. This is important because a 25,000 SKU grocery store would pose over one billion queries to write.
- Clustering: synonymous in marketing with data-driven segmentation. While humans can normally interpret relationships between two or perhaps three dimensions at a time (in OLAP this is a crosstab with an embedded dimension), clustering algorithms allow us to extend the complexity used to find similar customer groups. It is customary to use 7 +/- 2 numeric dimensions about a customer to generate segments. Once clusters have been found, an analyst can follow behind the algorithm to interpret/characterize the clusters with respect to the original dimensions.
- Predictive Modeling: a variety of techniques, highlighted in Figure 1, with each offering many variants (vague). For example, regression includes linear, logistic, mixed models and survival regression. Predictive techniques in the data mining industry are known as guided learning. This means that we prepare in advance a dataset that represents the event(s) of interest. We also prepare explanatory variables that can be used singularly and/or in combination to explain the known outcome(s). It is not uncommon to see models that work initially with hundreds of explanatory variables a feat well beyond the capacities of the human brain.
Predictive techniques provide a score (probability or predicted value) for every person in the customer file. This is powerful, as most companies have more than one million individuals or more than 50,000 businesses in their files. These individual scores offer an important lever for future decision making. For example, as opposed to blanket mailing the database, scores can be used to decide strategies. One such strategy might be to mail a marketing slick to the top 20 percent of customers (based on individual scores), a post card to the middle 50 percent and nothing to the bottom 30 percent.
- Forecasting and Time Series Analysis: different from traditional reporting because they are not just retrospective, but also use the past to predict the future. Time Series Analysis includes a wide array of exploratory and hypothesis testing methods, while Forecasting specializes in capitalizing on the time dimension of the data (not unlike predictive modeling). Forecasting applications range from SKU-level daily, weekly or monthly demand forecasts to econometric models used to forecast economic phenomena.
- Optimization: crucial in evaluating the trade-offs of complex systems to find the best places to operate. Whether it is called "mathematical" or "cross-channel" optimization, the complex systems may be made up of the models using the techniques described above. In the marketing domain, this is varying the channel, offer or size of the incentive communicated in order to optimize profitability within a fixed marketing budget.
Recent trends indicate that advanced analytics will continue to become more mainstream, particularly as large numbers of OLAP developers look for the next challenge. In addition, the desire to derive greater returns from data warehouse spending will also cause more organizations to use advanced analytic techniques that truly deliver on the promise to provide greater ROI when intelligently implemented.
Existing information analysis methods are becoming outdated and unable to support sustained growth, requiring companies to adopt new analytic power that provides them with a competitive edge over rivals. Accurate, in-depth analytic analysis provides the underlying value in helping a company make informed and insightful decisions – ultimately leading toward sustained success.
About the Author
John Wallace is an analytical consultant in the San Francisco office of SAS. He has worked with clients in the automotive, ISP, grocery, retail, PC/server and consumer software industries. He can be reached at firstname.lastname@example.org.
For More Information
- Feedback: E-mail the editor with your thoughts about this tip.
- More tips: Dozens of free data warehousing and BI tips and scripts.
- Tip contest: Have a data warehousing or BI tip to offer your fellow DBAs and developers? The best tips submitted will receive a cool prize -- submit your tip today!
- Ask the Experts: Our SQL, database design, Oracle, SQL Server, DB2, metadata, and data warehousing gurus are waiting to answer your toughest questions.
- Forums: Ask your technical data warehousing questions--or help out your peers by answering them--in our active forums.
- Best Web Links: Data warehousing and BI tips, tutorials, and scripts from around the Web.