It's been 20 years since the term "data warehousing" was first coined. Since then, it has evolved into a mainstream activity for Fortune 1000 companies, with many undertaking one or more data warehousing projects over the last decade. But, even after 20 years in the marketplace, it's evident that data warehousing is still misunderstood.
The goal of data warehousing is to create "one version of the truth." But in these attempts, many companies often create a proliferation of data silos instead. Here are some classic examples:
Companies implement a plethora of databases under the data warehousing umbrella -- many of them inconsistent or not integrated with each other. According to a 2004 The Data Warehousing Institute survey, "On average, organizations have two data warehouses, six independent data marts, 4.5 operational data stores and 28.5 spreadmarts."
Many ERP reporting and business intelligence (BI) efforts within companies continue to be siloed -- organizationally and technically. This is true despite the fact that ERP vendors have embraced data warehousing and have built data warehousing modules as part of their product offerings. Unfortunately, these modules are often implemented separately from the corporate data warehouses companies have already built.
Corporate performance management (CPM) presents a perfect business justification to implement a comprehensive, consistent and integrated data warehousing effort across an enterprise. However, a CPM project often implements a prebuilt solution from a BI vendor that is, just like the ERP one above, separate from the company's corporate data warehouse. This approach produces short-term, quick business wins. That's OK for the short term , but in the long term it just expands the silos and data inconsistency across a company.
Just because a company's data warehousing efforts have resulted in more data silos doesn't mean that data warehousing is a poor choice. What it might mean is that people simply do not understand or cannot get company-wide buy-in about how to do data warehousing right.
Why is there such a large gap between the goal of data warehousing and the reality of what a company has actually implemented? Quite often, it's confusion.
People often confuse a "data warehouse" with "data warehousing ." Data warehousing encompasses a complete architecture and process; it's not just having a single data warehouse . Data warehousing is the transformation of data to information, thereby enabling the business to examine its operations and performance. This task is accomplished by the staging and transformation of data from data sources, enabling the business to access and analyze information. The data stores may be persistent (stored on disk) or transient (using disk or memory). In addition, the workflow usually involves multiple data stores to support the staging and transformation of data into information such as operational data stores, data warehouses, data marts, online analytical processing cubes, files such as a flat file (comma-separated values extract, for example), XML data and even spreadsheets.
Any time you get data from the operational systems to perform reporting and analysis you are performing a data warehousing process. In the old days, it was called decision support; now the term is business intelligence. Data warehousing is what lies beneath the graphs and pivots presented by BI. In fact, BI is merely the presentation layer of the data warehousing architecture.
Too often data warehousing is associated merely with a data warehouse rather than the entire architecture and process. The problem is, when you narrow your focus to a single database, you lose the entire context of the staging of data. Data quality, consistency and integrity -- not to mention being able to audit the data trail -- is only achieved when the entire data staging (or data warehousing) architecture is considered.
A data warehousing program is much more than a data warehouse. With a single, narrow DW focus, separate efforts for ERP, CPM and BI recreate the DW architecture and create their own data silos. Ideally, an overall architectural view would let all these efforts leverage each other's work and reuse tools, code, processes, data and standards. A company would be able to implement these systems more economically, with a higher ROI, lower overall operating and maintenance costs, but also strive toward the "single version of the truth."
About the author
Rick Sherman has more than 18 years of business intelligence and data warehousing experience, having worked on more than 50 implementations as an independent consultant and as a director/practice leader at a Big Five accounting firm. He founded Athena IT Solutions, a Stow, Mass.-based business intelligence consulting firm. He can be reached at firstname.lastname@example.org.