Thursday, October 16, 2008

8. Data Warehouse Architecture

Data warehouses are almost universally designed as multidimensional databases, or, in relational database terms, as star schemas. The problem with both of these terms is that they are overly technical terms to the targeted users of business intelligence. This is unfortunate because the underlying concepts that they represent are relatively easy to understand for the technically naïve and very empowering for those willing to grasp their essence.

In an attempt to make the architecture of the standard data warehouse comprehensible to a wider audience, I submit the following:

1. A data warehouse is not really a multidimensional structure in any true geometric sense. Rather, it is simply a collection of facts wherein each fact is associated with a set of objects that related to it. True, it can be implemented as an array of many dimensions, but more generally, it is just a collection of facts and their material associations.

2. The star schema is a pattern of database design that has a core table of “facts,” or events that are historically recorded with their relations to business objects. Each category of business object (i.e. customer, product, business unit) is stored in a separate table that is connected to the fact table and is referred to as a dimension. The dimensional tables have no geometric significance, but rather, they represent collections of objects to which the facts relate to. Although a star schema can be thought of as a dimensional space, with each category of objects representing a dimension, it can also be simply thought of as a representation of historical events with each event being defined by a set of factors, each of which is represented by one of the dimensional tables.

3. Per the previous two observations, although the star schema is represented diagrammatically as a star-like structure of tables, with the fact table in the center surrounded by tables that represent the factors (dimensions) which define the facts, these same diagrams can be represented, with equal accuracy, as a table that is simply related to a set of factors. Above are two alternative diagrams for the same data warehouse. The implementation of the data warehouse remains the same, only the geometry of the graphic has changed. As shown in the above diagram, a star schema can be viewed as a table of facts and a set of tables that are used to factor the facts.


No comments: