Thursday, October 16, 2008

8. Data Warehouse Architecture

Data warehouses are almost universally designed as multidimensional databases, or, in relational database terms, as star schemas. The problem with both of these terms is that they are overly technical terms to the targeted users of business intelligence. This is unfortunate because the underlying concepts that they represent are relatively easy to understand for the technically naïve and very empowering for those willing to grasp their essence.

In an attempt to make the architecture of the standard data warehouse comprehensible to a wider audience, I submit the following:

1. A data warehouse is not really a multidimensional structure in any true geometric sense. Rather, it is simply a collection of facts wherein each fact is associated with a set of objects that related to it. True, it can be implemented as an array of many dimensions, but more generally, it is just a collection of facts and their material associations.

2. The star schema is a pattern of database design that has a core table of “facts,” or events that are historically recorded with their relations to business objects. Each category of business object (i.e. customer, product, business unit) is stored in a separate table that is connected to the fact table and is referred to as a dimension. The dimensional tables have no geometric significance, but rather, they represent collections of objects to which the facts relate to. Although a star schema can be thought of as a dimensional space, with each category of objects representing a dimension, it can also be simply thought of as a representation of historical events with each event being defined by a set of factors, each of which is represented by one of the dimensional tables.

3. Per the previous two observations, although the star schema is represented diagrammatically as a star-like structure of tables, with the fact table in the center surrounded by tables that represent the factors (dimensions) which define the facts, these same diagrams can be represented, with equal accuracy, as a table that is simply related to a set of factors. Above are two alternative diagrams for the same data warehouse. The implementation of the data warehouse remains the same, only the geometry of the graphic has changed. As shown in the above diagram, a star schema can be viewed as a table of facts and a set of tables that are used to factor the facts.


Tuesday, October 14, 2008

7. Unified BI

For a large corporation to have a reliable business intelligence (BI) initiative it needs to have a way of integrating information from various parts of the company. The company needs what is called a “single version of the truth.” Various information architectures have been designed to support the idea of a unified BI, however, none have proven to be a real solution to a growing problem. The question to be answered in this post is:

How can a company unify its various data warehouses, CRM databases, and other elements of a BI environment?

There is only one solution and, luckily, it is a simple one. The unifying principle of BI is to have universal object identifiers. The company unifies its BI environment by providing unique identities for each and every significant object in the BI environment. Allow the various business units within the company to evolve their own data mart with their own architectures as long as they provide the same unique, company-wide, identification for each object in their database.

By objects, I mean those identifiable entities that are relevant to the business, including the customers, business units, and products that are involved in the day-to-day transactions of business. Different departments of the company, each trying to satisfy its own business needs, may keep different data about these business objects, but the data can be integrated and shared among the departments as long as the customers, business units, and products have a common company-wide identity throughout the company.

The primary class of objects that is of interest to the company’s business, however, is the transaction itself. Each commercial transaction, representing the sale of item, the payment of an expense, or the trading of assets, is an object that must have a unique identity, produced as it occurs to the business. With each transaction being assigned a unique identifier at the time of its creation (i.e. a point of sale), the information in a marketing data warehouse can be integrated and analyzed with related information in the finance, sales, and customer relations departments.

Universal object identifiers are the key to a unified BI, providing the company with a version of the facts that transcends the boundaries of various departmental boundaries. From a single identity of for each fact comes a single version of the truth.