Thursday, November 13, 2008

Business Intelligence as a Service

“There is a service waiting to be born.”

Peter Drucker, The Daily Drucker


How does Web 2.0 fit into the growing world of Business Intelligence? This post attempts to answer this question by making two points:

  1. Business Intelligence as a Service - For most companies Business Intelligence needs to be provided as a service, with the technical aspects outsourced to experts.
  2. Web 2.0 - The best way to provide Business Intelligence as services is through secured internet pages using Web 2.0 technology.

The combination of these two factors makes Business Intelligence (or BI) a perfect opportunity for a small group of very skilled people that are able to provide what most businesses cannot provide on their own.

1. Business Intelligence as a Service

Business is an information intensive activity with a powerful competitive advantage going to those companies can best determine their customer’s needs and the means of satisfying those needs. Business Intelligence is the application of data analysis to finding the business factors that provide this competitive advantage. Business Intelligence offers insightful and timely information to a company’s strategic decision-makers. Patterns and correlations between customers, products, markets, and other information can be provided to executives that will help them allocate the company’s resources to where they can be most productive for both the company and the economy within which it functions.

BI, however, requires specialized knowledge that does not exist within most companies, even including those companies that can afford their own sizeable IT staffs. How to structure a data warehouse, create data mining models, or using statistical techniques to classify customers or predict sales are skills that are rare in the typical IT staff and require a large amount of specialized training to grow within the company. To date, most BI ventures have either failed completely or have proven to be disappointing in the amount of information produced and the expenses incurred.

In addition to the inherent difficulty in producing a BI staff, there is the problem of scale that limits the development of BI to only those companies that are large enough to allocate an enormous budget to produce an automated extension to their enterprise decision makers. While the international conglomerate can afford to throw millions of dollars at a BI effort, the vast majority of businesses will find an in-house BI effort prohibitive.

According to Adam Smith, the “wealth of nations” was primarily a resultt of a division of labor that allowed the baker to bake, the painter to paint, and the pin-maker to produce pins. The production of Business Intelligence, while requiring a commitment from the business man, requires an expertise that is affordable only to the man of leisure, the corporate executive with an infinite budget, or the business person who is resourceful enough to allocate the effort to an expert service. As smaller firms began to see the gains made by their bigger competitors, they will need to turn to a firm that can provide BI as a service.


2.
Business Intelligence and Web 2.0

A service company needs to deliver a service, and in the case of Business Intelligence, the service to be delivered is information. The consumer needs to receive information in a manner that it will provide him insights to his business. In addition, the consumer needs to deliver raw data, representing the activities of the company, to the service-provider to be stored in such a manner that it can easily be turned into intelligence. If Business Intelligence is to be provided as a service, the service needs to be able to communicate data to and from the service consumer and it must be able to present the data in a manner that is user-friendly, efficient, and intuitive.


The communications requirement immediately suggests the internet. By way of the internet, data can be both uploaded to or downloaded from a service-provider’s database. There is no need for a dedicated network; the user only needs to access the database through an internet address and whatever security is needed to protect the data’s privacy. The URL is the application-download location, and the browser is the application platform. With the internet, the complete process of storing and reading data in an outsourced database is easily resolved. Users and developers have embraced the web because it offers centralized management and delivery, no local installation, instant updates to both functionality and content, and elements such as shareable bookmarks.


The problem with the internet is the classic render-call-render pattern that dominates most web applications. The traditional way of providing an interactive web site is to enter data in a page and wait for the page’s host to return a new updated page. This lacks the smooth interaction that a user has with an application that is running on his local machine. On his local machine he is able to enter data and have the program respond without rewriting the screen; the application is working in an interactive manner directly with the user and there is no delay time while a page and its bulky graphics are retransmitted across the internet. Conventional web applications solve the problem of data communications but leave the problem of a clumsy interface that interrupts the users thought and confronts him with interactive ambiguity.


This brings us to Web 2.0 and its ability to turn an internet browser into a virtual desktop. Web 2.0 teams the experience of interactivity with the underlying connectivity that comes from the internet. According to Tim O’Reilly, “Web 2.0 is the business revolution in the computer industry caused by the move to the internet as a platform, and an attempt to understand the rules for success on that new platform.”


At the center of the Web 2.0 phenomena are the interactive technologies AJAX, Adobe Flash, Flex, and Java. AJAX, for example, has prompted the development of websites that mimic desktop applications. Within a page based upon AJAX, there is embedded JavaScript code that can actually operate as a full-blown application, giving the user the same experience that he gets from word processing and spreadsheet applications that sit on his own machine.


With the combination of expert Business Intelligence designers combined with the technology of Web 2.0, there is a service out there waiting to be born.

Friday, November 7, 2008

The Universal OLAP Query

One of the impediments to providing Business Intelligence to management is the technical expertise required to make a request to the data warehouse. A busy executive cannot be expected to be able to use a database language like SQL and must therefore depend upon standard rigid reports from OLAP reporting tools. In this post, I will demonstrate how easy it is to provide a technically untrained executive with a graphical tool that will allow him to generate millions of combinations of queries to standard data warehouse.

The challenge appears difficult because, not only is SQL a technically complex language based upon mathematical set theory, but the SQL queries that go to a database designed as a data warehouse are particularly complex to code. However, within these challenges lies the very solution to providing real data warehouse power and flexibility to the non-technical user.

Every request for data from an orthodox data warehouse is complex but exactly the same in form—only the parameters change in a statement that must have the same components in the same places to utilize the analytical power of the data warehouse. The vehicle must be the same while the options differ depending upon the specific questions asked by the analyst. Let’s dissect the standard data warehouse query and factor out the generalities from the particulars of a specific query and thereby determine what are the parameters are to a “Universal OLAP Query.”

First of all, data warehouses have a particular architecture to their data that remains the same whether we are using a specialized “multidimensional database” or a standard relational database designed to implement a multidimensional “star schema.” This architecture provides the analyst with a set of facts that can be grouped and summed according to the dimensions of the facts. For example, where a company keeps a data warehouse of retail sales, the design of the data warehouse would appear as follows where the blocks are tables (a single “fact” table that is linked to four “dimension tables” that contain the sets of objects that the retail sales are related to):

Although the dimensions are specific to this data warehouse, the structure of a fact table attached to a set of dimensions remains the same in any orthodox data warehouse, accounting for its description as a “multidimensional database” or a “star schema.” Much of the complexity of an OLAP data query is a result of the joins that must occur between the Fact table and one or more of the dimensional tables. The solution to creating a universal OLAP query is automate this join process and provide the non-technical analyst with a user interface that allows him to simply select the facts and dimension that those facts will be rolled up by. Know this, we can see that the star schema query will always appear in the following form:

select Fact, Dimension Criteria
from Fact Table, Dimensional Tables
where Join Criteria, Slicing Criteria
group by Dimensional Criteria

This structure remains the same for each OLAP query, the things that change from query to query are the underlined terms that we can turn into parameters to our universal OLAP query generator. Reviewing these terms in the above example gives us the following:

This reduces the syntax of SQL (or some other technical language) to a set of parameters that can be chosen from a graphical user interface that uses drag-and-drop or some other selection process for our non-technical user, however, there are a lot of parameters here and we can reduce them somewhat. The identity of the Fact Table can be programmatically inferred from the selection of Facts, eliminating it as a parameter. Likewise, the Dimension Tables and the Join Criteria can be inferred from Dimensional Criteria and the Slicing Criteria (We assume that the Universal Query Generator has been initialized with the metadata that describes the data warehouse). This reduces the input to the generator as follows:

A user interface that has been configured with the description of the data warehouse can then provide a simple wizard that allows the user to select the facts that he wants rolled up (summed, averaged, etc.), the attributes of the dimensions by which the data will be rolled up (for example, customer age group, product color, time periods, etc.), and the limits of the data (slicing criteria). The Facts and Dimensional Criteria are simple selection operations, while the slicing criteria can involve simple to more involved logic as data is sliced with “and” and “or” conditions. However, the whole thing can be done with an intuitive user interface that requires no technical language skills.

Thursday, November 6, 2008

Drucker on Integrating Decision Support Data

“…most businesses have two information systems. One is organized around the data stream; the other, far older one, around the accounting system. The accounting system, though, is a five-hundred-year-old information system that is in terrible shape. The changes we will see in information technologies over the next twenty years are nothing compared to the changes we will see in accounting.”

Peter Drucker, Management in the Next Society

The heart of Business Intelligence is the data warehouse, with its historical record of the business events. It typically includes data about customers, products, business units, and other classes of objects that are critical to the company’s decision-making. Unfortunately, it has not included the most critical dimension of the business event—the movement of financial resources that occur during that business event.

This data, historically the most important of all business information, has been sequestered from the world of Business Intelligence by accountants. The accountants use this data to produce reports that satisfy government regulations while providing almost no strategic assistance to the company’s management. According to Peter Drucker, decision support will have to include the transactional data that is produced by the accounting department.

“At present, these two systems [data stream and accounting] are separate. They will not be separate for our grandchildren’s generation. Today’s CEOs still depend on the accounting model. I don’t know of a single business that bases its decisions on the data processing stream. Everyone bases their decisions on the accounting model, even though most of us have learned how easy this model is to manipulate.”

Peter Drucker, Management in the Next Society

Drucker is right; what we call Business Intelligence will have to be integrated with accounting data. It simply has to; the competitive advantages of doing it are too great. More importantly, the competitive disadvantages of not doing it will make it essential.

Thursday, October 16, 2008

8. Data Warehouse Architecture

Data warehouses are almost universally designed as multidimensional databases, or, in relational database terms, as star schemas. The problem with both of these terms is that they are overly technical terms to the targeted users of business intelligence. This is unfortunate because the underlying concepts that they represent are relatively easy to understand for the technically naïve and very empowering for those willing to grasp their essence.

In an attempt to make the architecture of the standard data warehouse comprehensible to a wider audience, I submit the following:

1. A data warehouse is not really a multidimensional structure in any true geometric sense. Rather, it is simply a collection of facts wherein each fact is associated with a set of objects that related to it. True, it can be implemented as an array of many dimensions, but more generally, it is just a collection of facts and their material associations.

2. The star schema is a pattern of database design that has a core table of “facts,” or events that are historically recorded with their relations to business objects. Each category of business object (i.e. customer, product, business unit) is stored in a separate table that is connected to the fact table and is referred to as a dimension. The dimensional tables have no geometric significance, but rather, they represent collections of objects to which the facts relate to. Although a star schema can be thought of as a dimensional space, with each category of objects representing a dimension, it can also be simply thought of as a representation of historical events with each event being defined by a set of factors, each of which is represented by one of the dimensional tables.

3. Per the previous two observations, although the star schema is represented diagrammatically as a star-like structure of tables, with the fact table in the center surrounded by tables that represent the factors (dimensions) which define the facts, these same diagrams can be represented, with equal accuracy, as a table that is simply related to a set of factors. Above are two alternative diagrams for the same data warehouse. The implementation of the data warehouse remains the same, only the geometry of the graphic has changed. As shown in the above diagram, a star schema can be viewed as a table of facts and a set of tables that are used to factor the facts.


Tuesday, October 14, 2008

7. Unified BI

For a large corporation to have a reliable business intelligence (BI) initiative it needs to have a way of integrating information from various parts of the company. The company needs what is called a “single version of the truth.” Various information architectures have been designed to support the idea of a unified BI, however, none have proven to be a real solution to a growing problem. The question to be answered in this post is:

How can a company unify its various data warehouses, CRM databases, and other elements of a BI environment?

There is only one solution and, luckily, it is a simple one. The unifying principle of BI is to have universal object identifiers. The company unifies its BI environment by providing unique identities for each and every significant object in the BI environment. Allow the various business units within the company to evolve their own data mart with their own architectures as long as they provide the same unique, company-wide, identification for each object in their database.

By objects, I mean those identifiable entities that are relevant to the business, including the customers, business units, and products that are involved in the day-to-day transactions of business. Different departments of the company, each trying to satisfy its own business needs, may keep different data about these business objects, but the data can be integrated and shared among the departments as long as the customers, business units, and products have a common company-wide identity throughout the company.

The primary class of objects that is of interest to the company’s business, however, is the transaction itself. Each commercial transaction, representing the sale of item, the payment of an expense, or the trading of assets, is an object that must have a unique identity, produced as it occurs to the business. With each transaction being assigned a unique identifier at the time of its creation (i.e. a point of sale), the information in a marketing data warehouse can be integrated and analyzed with related information in the finance, sales, and customer relations departments.

Universal object identifiers are the key to a unified BI, providing the company with a version of the facts that transcends the boundaries of various departmental boundaries. From a single identity of for each fact comes a single version of the truth.

Friday, July 11, 2008

6. Virtual Data and Algorithms

There is no problem in all mathematics that cannot be solved by direct counting.
Ernst Mach

In the two previous posts, virtual data was described as data that was produced by an algorithm from already stored data. It is produced upon demand by the computer and its programs as opposed to being stored and maintained within the computer’s database.

Virtual data is typically the output or finished product of an algorithm that runs within an application in the computer. The computer executes the algorithm upon the user’s request, and real stored data is read and converted by the algorithm into new “virtual” data that is not directly kept in memory.

This application of an algorithm is always deterministic and therefore one algorithm along with one set of input from stored data should provide only one set of unique information. To produce another piece of virtual data, either the algorithm or the input data (or the combination of both) must change. Some variations upon this model can be realized by parameterizing the algorithm to allow it to operate with user’s options – essentially, to make it a slightly different algorithm.

The sums that are produced from a data warehouse are a particularly powerful illustration of producing virtual data from a parameterized algorithm working upon stored data. The algorithm can be considered a particular type of database query, perhaps written in SQL, and the parameters are the dimensions by which the algorithm summarizes the data. This is the most powerful form of virtual data because the dimensions can be combined in thousands upon thousands of different combinations, producing an endless variety of virtual data. The SQL query in a data warehouse can be parameterized into being essentially millions of different algorithms.

Most of the numbers that run our businesses are virtual data that can be produced by combining and summarizing the company’s simple financial transactions. The data warehouse is the factory that can produce this virtual data, and millions of variations on it, through its powerful ability to be parameterized by dimensions.

5. Virtual Data and BI

Knowledge is one of the scarcest of all resources.

Thomas Sowell

Virtual data is the reason why the star schema architecture of the BI data warehouse is such a powerful means of producing new information. Virtual data is the information that is produced on demand rather than stored in a database (see previous post). As the power of the computer creates a greater demand for virtual data, the star schema will become a greater factor in maintaining a company’s competitive advantage.

The star schema data warehouse stores primitive data, representing individual events or transactions, in a form that relates those events to the real-world objects that are they are related to. Each dimension of the star schema’s “multidimensional” architecture represents one of these real-world objects. Customers, business units, dates, and financial accounts are examples of typical dimensions stored in an effective data warehouse.

The many dimensions of a company’s data warehouse allow the user to summarize the events of the company’s history by the real-world objects that the company is related to. The company’s revenue can be summarized by business unit, customer type, product, geographical location, or any other factor that the company has deemed relevant to its analysis.

The factors (real-world objects) can be combined and correlated and the number of combinations and correlations are nearly infinite in number based upon the arithmetic of combinations (a later post).

As the typical twenty-first century business becomes increasingly information-based, its data warehouse will become a more critical factor in its ability to generate new information. The data warehouse will be able to produce more information because information will become increasingly virtual, produced on demand from the primitive atoms that represent the grist for the company’s analytical mill.

See Banking the Past.

Thursday, July 10, 2008

4. Virtual Data

The smallest operations can now afford financial control programs that account
for their finances with greater speed and sophistication that even the largest
corporations could have achieved through their production hierarchies a few
decades ago.

James Dale Davidson and Lord William Rees-Mogg
Business intelligence will become increasingly based upon “virtual data” as we proceed into the twenty-first century. Virtual data is the information that is produced by a computer from more primitive data stored in the computers’ databases.

Examples of powerful information that will soon be virtual data are the monetary amounts that move and shake our financial markets. Earnings, income, and liquidity are the critical numbers that give us a measure of the success and economic viability of a company. These numbers, typically represented as a few data points for each quarter of a year of business, are really the sums of millions of individual transactions that the company has incurred during that time period. They have been traditionally computed and stored each quarter and become the primary financial data of the company as the records of the transactions themselves are relegated to the information background.

However, these pieces of summary data will become virtual, or produced on demand, rather than primary data, for the simple reason that the production and reproduction of the data is extremely cheap and accurate with the advent of the modern computer.

The product of arithmetic operations, particularly where large volumes of data are concerned, has historically been turned into stored data because of the cost involved in manually performing these operations. However, the arithmetic that humans have produced laboriously and erratically can now be performed perfectly and effortlessly by a computer.

The inexpensiveness of performing arithmetic on a computer is the essential factor in determining when information becomes virtual as opposed to being kept as stored data. It is just a matter of fundamental economics – if it costs nothing to reproduce the data, it has no value as stored information and can just as well be reproduced on demand.

This virtualization of critical data is the key behind the increasing power of business intelligence and the star schema architecture that defines a data warehouse. The arithmetic sums that are produced by an OLAP data warehouse are produced on the fly from underlying primitive information (typically, atomic financial transactions). By “virtualizing” these arithmetic sums from the underlying atomic data, we are able to use the same underlying atomic data to produce other arithmetic sums. This is basically how we use a data warehouse: we slice the data along combinations of dimensions to produce, for example, not only earnings, but earnings by business unit, store type, or customer demographics.

We are now in the era of the ubiquitous computation engine and the cost of performing computations is approaching zero, minimizing the need to store and save the result and maximizing the value of primitive atomic data. See Banking the Past.

Thursday, July 3, 2008

3. Process-Oriented Financial Analysis

The Way [Tao] is that towards which all things flow.
Tao Te Ching, Chapter 62

Financial analysis in the twenty-first century will move from the study of static things towards the study of change and flow. This movement will follow the pattern of science as science became mathematically more sophisticated.

The ancient Greeks regarded science as a study of things. Things, in their most essential state, unaffected by changes that are temporary, were what the Greeks measured and wrote down for posterity. Stones and buildings were measured with a geometry that the Greeks became expert at and had an almost religious reverence for. Geometry, the mathematics of the Greeks, was a perfect tool for measuring things – things that were constant and unchanging.

Living things were grouped by the Greeks into species, families, and other categories according to their most essential characteristics. Just as the Pythagoreans revered geometry, Aristotle venerated categories, the catalogue of all of nature into groups.

In the same way that today’s financial information is given meaning by placing things (transactions) into categories (ledger accounts), Aristotle’s science was a primitive matter of placing everything in it proper category and then measuring the things found in those categories. For Aristotle, as it is for today’s modern financial analysis, there was no attempt to find cause and effect or correlations and indications; thins just have their proper category and follow the order that comes with those categories.

In science, however, the seventeenth century gave us Galileo Galilei, Isaac Newton, Gottfried Leibniz, and a revolution in science away from the study of change and towards the study of process. The product of this revolution was a major contributor to the modern industrial societies that we live in now. This same revolution is slowly occurring in the always conservative practice of financial record keeping (i.e. accounting and bookkeeping). The invention of the Cash Flow Statement in the 1970’s is the first major step in this direction. The use of data warehouses and more modern data analysis techniques with the computer now allow us to make financial analysis the dynamic study of movements, changes, processes, and the general flow of financial resources.

The coming years will bring us the merger of Business Intelligence (BI) and financial record-keeping into what will truly be “the language of business.” See p. 36, Banking the Past.

Thursday, June 26, 2008

2. The Information Age

The logical instrument which Aristotle used for the analysis of actual fact into more abstract elements was that of classification into species and genera.
Alfred North Whitehead

Financial analysis is an information-based undertaking. It determines the value of a company by the data that measures the company’s assets, income, and cash flow. The quality of the financial analysis is a product of the quantity, quality, and timeliness of the information that is used in making the analysis. Better and more current information makes better financial analysis possible and leads to better investment decisions.

In the age of computers, the volume of data and the tools used to analyze data have grown enormously. Companies, both large and small, are now capable of generating huge amounts of data from automated cash registers and scanning devices and inexpensive databases are available that allow that data to be searched, selected, and summed into digestible morsels of wisdom. The future of financial analysis lies in the ability to analyze the large volumes of data that are now being produced by the ubiquitous personal computer. See The Tao of Financial Information.

1. Information and Decisions

Knowledge is never perfect, and the longer the time between a decision and its consequences, the wider the gray area of uncertainty.
Thomas Sowell

The trading advantage that comes from information is a result of the information’s timeliness as well as its volume. If a party to a trade has as much information as the other trading partners, but his information came after the terms of the trade had been agreed upon, he bargained with a substantial disadvantage and most likely did not get as much profit from the trade as he should have. If an investor has several investment opportunities that are alternatives to each other, he needs to have information prior to making his investment. To have value as intellectual capital, information must not only be relevant and reliable – it must also be timely. See Banking the Past.