Problem solve Get help with specific problems with your technologies, process and projects.

Five myths of data mining

Many successful companies have discovered is that the myths that have grown up around data mining are just that: myths.

Many successful companies have discovered that the myths that have grown up around data mining are just that: myths. Rather than fall victim to them, the visionaries have gleaned enormous competitive advantage by using data mining to solve complex business problems and reach for profitability.

In fact, it was sophisticated data mining technology that convinced rural Wal-Marts to stock up on a special version of Spam for the hunting season. Go ahead. Laugh. But Spamouflage -- Spam in camouflage cans -- has been a huge success. Much more than a cute idea, Spamouflage has helped Wal-Mart generate additional revenue from its existing customers and is emblematic of just how deeply Wal-Mart understands the people it serves.

What is data mining anyway?

Data mining is a powerful analytical tool that enables business executives to advance from describing historical customer behavior to predicting the future. It finds patterns that unlock the mysteries of customer behavior. These findings can be used to increase revenue, reduce expenses and identify business opportunities, offering new competitive advantages.

Part of the reason myths have developed around data mining is that people are confused about what it is. At its core, data mining is defined as a set of complex mathematical techniques used to discover and interpret previously unknown patterns in detail data. Since the mid-1980's when data mining expanded from use in the academic, medical and scientific research world, it has been applied very effectively in retail, banking, telecommunications, insurance, travel and hospitality.

Because data mining is considered an analytical tool, it is often confused with on-line analytic processing (OLAP). OLAP is a valuable analytical technique when used to analyze business operations to gain a historical perspective on what happened. For example, a marketing manager wants to understand why sales have dropped in a particular region. OLAP tools allow him to ask questions across multiple dimensions, such as sales by store, sales by products and sales over time. By viewing historic data from different views, he'll be able to analyze the drivers (store, product or time) that impacted sales.

Data mining tackles a different class of problems. It can be used to predict future events like the sales in the next month based on promotions or which type of customer is most likely to respond to a promotion.

The way a number of companies are already using it dispels the five key myths about data mining.

Myth one: Data mining provides instant crystal ball-predictions

Data mining is neither a crystal ball nor a technology where answers magically appear after pushing a single button. It's a multi-step process that includes: defining the business problem, exploring and conditioning data, developing the model, and deploying the knowledge gained. Typically, companies spend the bulk of their time preprocessing and conditioning the data to make sure it is clean, consistent, and combined properly to deliver business intelligence on which they can rely. Data mining is all about the data -- successful data mining requires data that accurately reflects the business.

Companies must understand where the power of data mining lies: in tackling specific business challenges that are predictive or descriptive in nature. These might include:

  • Segmenting customers
  • Predicting customer propensity to buy
  • Detecting fraud
  • Optimizing supply and distribution channels

Companies that understand the process are seeing real results. A Midwest-based healthcare provider identifies high-risk patients and responds with case-management programs that maintain the quality of care and manage risk. A South American telecommunications firm anticipates and prevents the loss of high-value customers by identifying patterns that lead to customer attrition by analyzing phone usage, services purchased and service-quality ratings. An insurance company based in the U.S. relies on the timelines of its data mining solution to anticipate and quickly detect fraud and then take immediate action to minimize costs.

Myth two: Data mining is not yet viable for business application

Data mining is viable technology and highly prized for its business results. The myth tends to be perpetrated by those who need to explain why they are not yet using the process and revolves around two related statements. The first says, "Huge databases can't be mined effectively." The second says, "Data mining can't be done in the data warehouse engine." Both statements were once true; it was also once true that airplanes couldn't get off the ground.

Let's deal with two statements together. Because today's databases are so large, companies are concerned that the extra IT architecture needed for data mining projects will add enormous costs and that the data processing for each project will simply take too long. But some of today's data databases use parallel technology, which enables in-database mining. By mining within the database, companies can eliminate data movement, leverage the performance of parallel processing, minimize data redundancy, and eliminate the cost to create and maintain an entirely new and redundant database dedicated to data mining. In-database mining done with parallel processing translates into viable data mining technology.

For example, a packaged goods manufacturing company used data mining to maintain a customer loyalty program which it used to assist its retail partners to monitor promotion effectiveness and do an analysis of its shoppers' market basket. Initially, this analysis was an effective means of encouraging partners to promote their products. However, the volume of data that needed to be processed grew so large that this service became too costly to offer to the retail partners. Although the analysis was done on a powerful server, it took the five analytic applications more than 312 hours to process the data.

Before terminating this valuable service, the company explored in-database data mining techniques. They loaded the data into a centralized data warehouse, then converted all five analyses into a SQL (Standard Query Language) program that ran in the database, leveraging its parallel processing power. By converting to in-database data mining, they were able to reduce execution time from more than 312 hours to 12 hours, saving the customer loyalty program.

Myth three: Data mining requires separate, dedicated database

Data mining vendors typically claim you need an expensive, dedicated database, data mart or analytic server to mine data because of the need to pull data into a proprietary format for efficient processing. These data marts are not only costly to purchase and maintain, but also imply data extraction for each separate data mining project, an expensive and a major time-wasting process.

Advances in database technology now demand that data mining is no longer done in a separate data mart. In fact, effective data mining requires an enterprise-wide data warehouse, which for the total cost of investment is considerably less expensive than employing separate data marts.

Here's why. As companies implement data mining projects across the enterprise, the number of users leveraging the data mining models continues to grow as does the need to access large data infrastructures. A cutting edge enterprise data warehouse not only efficiently stores all enterprise data and eliminates the need for most other data marts or warehouses, but it also provides an ideal foundation for data mining projects. That foundation is a single enterprise-wide repository of data which provides a consistent and current view of the customer. And by incorporating data mining extensions within the data warehouse, companies can reduce costs in two additional ways. First, there is no need to purchase and maintain additional hardware dedicated solely to data mining. And, second, companies minimize the need to move data in and out of the warehouse for data mining projects, which as already noted is a labor and resource-intensive process.

Myth four: Only Ph.D.s can do data mining

Some consider data mining to be so complex that there must be at least three Ph.D.s to make it happen: one in statistics or quantitative methods, one in business who understands the customer, and one in computer science.

The truth is that successful projects have been completed with nary a Ph.D. in sight. For example, Teradata recently completed a project with a South American telecommunications company where it successfully tracked customer behavior changes that helped the company retain 98 percent of its high value customers during deregulation. Working collaboratively, a multi-disciplinary team successfully completed the initiative.

Data mining is a collaborative effort among knowledgeable personnel in all three areas. Business people must guide the project by creating a set of specific business questions and then must interpret the patterns that emerge. Analytic modelers, with an understanding of data mining techniques, statistics, and tools must build a reliable model. IT personnel provide insight into processing and understanding the data as well as providing key technical support.

Myth five: Data mining is for large companies with lots of customer data

The plain fact is that if a company, large or small, has data that accurately reflects the business or its customers, it can build models against that data that lend insights into important business challenges. The amount of customer data a company possesses has never been the issue.

For example, Midwest Card Services (MCS) provides telemarketing, ATM management, debit cards, and specialized financial services to some 200,000 clients. The company used a centralized database and to better understand its customer base, segment customers effectively and understand their patterns and preferences. This has allowed MCS to improve its own underwriting and provide its clients with comprehensive reports on their portfolios.

Seize the day

Here's the bottom line: Data mining is no longer slow or expensive or too complicated to work. The technology and the business know-how exist to put in place an efficient cost-effective process. Companies of various sizes are among those putting the old myths to the test and proving that data mining is essential to thrive in today's hotly competitive, customer-focused business world.

About the Author

Arlene Zaima is the Data Mining Marketing Manager for Teradata, a division of NCR. She has been with Teradata's Data Mining team for the last 7 years as a Marketing Manager and Product Manager and has more than 15 years experience with data warehousing solutions. Zaima is responsible for the development and execution of worldwide marketing initiative of the Teradata data mining solutions and has positioned Teradata as the leader in "In-Database" data mining technology. She frequently speaks at Data Mining and Data Warehousing conferences and has also authored published articles on Data Mining. For additional information on Teradata, visit


Dig Deeper on Oracle data warehousing