Friday, November 7, 2008

The Universal OLAP Query

One of the impediments to providing Business Intelligence to management is the technical expertise required to make a request to the data warehouse. A busy executive cannot be expected to be able to use a database language like SQL and must therefore depend upon standard rigid reports from OLAP reporting tools. In this post, I will demonstrate how easy it is to provide a technically untrained executive with a graphical tool that will allow him to generate millions of combinations of queries to standard data warehouse.

The challenge appears difficult because, not only is SQL a technically complex language based upon mathematical set theory, but the SQL queries that go to a database designed as a data warehouse are particularly complex to code. However, within these challenges lies the very solution to providing real data warehouse power and flexibility to the non-technical user.

Every request for data from an orthodox data warehouse is complex but exactly the same in form—only the parameters change in a statement that must have the same components in the same places to utilize the analytical power of the data warehouse. The vehicle must be the same while the options differ depending upon the specific questions asked by the analyst. Let’s dissect the standard data warehouse query and factor out the generalities from the particulars of a specific query and thereby determine what are the parameters are to a “Universal OLAP Query.”

First of all, data warehouses have a particular architecture to their data that remains the same whether we are using a specialized “multidimensional database” or a standard relational database designed to implement a multidimensional “star schema.” This architecture provides the analyst with a set of facts that can be grouped and summed according to the dimensions of the facts. For example, where a company keeps a data warehouse of retail sales, the design of the data warehouse would appear as follows where the blocks are tables (a single “fact” table that is linked to four “dimension tables” that contain the sets of objects that the retail sales are related to):

Although the dimensions are specific to this data warehouse, the structure of a fact table attached to a set of dimensions remains the same in any orthodox data warehouse, accounting for its description as a “multidimensional database” or a “star schema.” Much of the complexity of an OLAP data query is a result of the joins that must occur between the Fact table and one or more of the dimensional tables. The solution to creating a universal OLAP query is automate this join process and provide the non-technical analyst with a user interface that allows him to simply select the facts and dimension that those facts will be rolled up by. Know this, we can see that the star schema query will always appear in the following form:

select Fact, Dimension Criteria
from Fact Table, Dimensional Tables
where Join Criteria, Slicing Criteria
group by Dimensional Criteria

This structure remains the same for each OLAP query, the things that change from query to query are the underlined terms that we can turn into parameters to our universal OLAP query generator. Reviewing these terms in the above example gives us the following:

This reduces the syntax of SQL (or some other technical language) to a set of parameters that can be chosen from a graphical user interface that uses drag-and-drop or some other selection process for our non-technical user, however, there are a lot of parameters here and we can reduce them somewhat. The identity of the Fact Table can be programmatically inferred from the selection of Facts, eliminating it as a parameter. Likewise, the Dimension Tables and the Join Criteria can be inferred from Dimensional Criteria and the Slicing Criteria (We assume that the Universal Query Generator has been initialized with the metadata that describes the data warehouse). This reduces the input to the generator as follows:

A user interface that has been configured with the description of the data warehouse can then provide a simple wizard that allows the user to select the facts that he wants rolled up (summed, averaged, etc.), the attributes of the dimensions by which the data will be rolled up (for example, customer age group, product color, time periods, etc.), and the limits of the data (slicing criteria). The Facts and Dimensional Criteria are simple selection operations, while the slicing criteria can involve simple to more involved logic as data is sliced with “and” and “or” conditions. However, the whole thing can be done with an intuitive user interface that requires no technical language skills.

No comments: