Friday, July 11, 2008

6. Virtual Data and Algorithms

There is no problem in all mathematics that cannot be solved by direct counting.
Ernst Mach

In the two previous posts, virtual data was described as data that was produced by an algorithm from already stored data. It is produced upon demand by the computer and its programs as opposed to being stored and maintained within the computer’s database.

Virtual data is typically the output or finished product of an algorithm that runs within an application in the computer. The computer executes the algorithm upon the user’s request, and real stored data is read and converted by the algorithm into new “virtual” data that is not directly kept in memory.

This application of an algorithm is always deterministic and therefore one algorithm along with one set of input from stored data should provide only one set of unique information. To produce another piece of virtual data, either the algorithm or the input data (or the combination of both) must change. Some variations upon this model can be realized by parameterizing the algorithm to allow it to operate with user’s options – essentially, to make it a slightly different algorithm.

The sums that are produced from a data warehouse are a particularly powerful illustration of producing virtual data from a parameterized algorithm working upon stored data. The algorithm can be considered a particular type of database query, perhaps written in SQL, and the parameters are the dimensions by which the algorithm summarizes the data. This is the most powerful form of virtual data because the dimensions can be combined in thousands upon thousands of different combinations, producing an endless variety of virtual data. The SQL query in a data warehouse can be parameterized into being essentially millions of different algorithms.

Most of the numbers that run our businesses are virtual data that can be produced by combining and summarizing the company’s simple financial transactions. The data warehouse is the factory that can produce this virtual data, and millions of variations on it, through its powerful ability to be parameterized by dimensions.

5. Virtual Data and BI

Knowledge is one of the scarcest of all resources.

Thomas Sowell

Virtual data is the reason why the star schema architecture of the BI data warehouse is such a powerful means of producing new information. Virtual data is the information that is produced on demand rather than stored in a database (see previous post). As the power of the computer creates a greater demand for virtual data, the star schema will become a greater factor in maintaining a company’s competitive advantage.

The star schema data warehouse stores primitive data, representing individual events or transactions, in a form that relates those events to the real-world objects that are they are related to. Each dimension of the star schema’s “multidimensional” architecture represents one of these real-world objects. Customers, business units, dates, and financial accounts are examples of typical dimensions stored in an effective data warehouse.

The many dimensions of a company’s data warehouse allow the user to summarize the events of the company’s history by the real-world objects that the company is related to. The company’s revenue can be summarized by business unit, customer type, product, geographical location, or any other factor that the company has deemed relevant to its analysis.

The factors (real-world objects) can be combined and correlated and the number of combinations and correlations are nearly infinite in number based upon the arithmetic of combinations (a later post).

As the typical twenty-first century business becomes increasingly information-based, its data warehouse will become a more critical factor in its ability to generate new information. The data warehouse will be able to produce more information because information will become increasingly virtual, produced on demand from the primitive atoms that represent the grist for the company’s analytical mill.

See Banking the Past.

Thursday, July 10, 2008

4. Virtual Data

The smallest operations can now afford financial control programs that account
for their finances with greater speed and sophistication that even the largest
corporations could have achieved through their production hierarchies a few
decades ago.

James Dale Davidson and Lord William Rees-Mogg
Business intelligence will become increasingly based upon “virtual data” as we proceed into the twenty-first century. Virtual data is the information that is produced by a computer from more primitive data stored in the computers’ databases.

Examples of powerful information that will soon be virtual data are the monetary amounts that move and shake our financial markets. Earnings, income, and liquidity are the critical numbers that give us a measure of the success and economic viability of a company. These numbers, typically represented as a few data points for each quarter of a year of business, are really the sums of millions of individual transactions that the company has incurred during that time period. They have been traditionally computed and stored each quarter and become the primary financial data of the company as the records of the transactions themselves are relegated to the information background.

However, these pieces of summary data will become virtual, or produced on demand, rather than primary data, for the simple reason that the production and reproduction of the data is extremely cheap and accurate with the advent of the modern computer.

The product of arithmetic operations, particularly where large volumes of data are concerned, has historically been turned into stored data because of the cost involved in manually performing these operations. However, the arithmetic that humans have produced laboriously and erratically can now be performed perfectly and effortlessly by a computer.

The inexpensiveness of performing arithmetic on a computer is the essential factor in determining when information becomes virtual as opposed to being kept as stored data. It is just a matter of fundamental economics – if it costs nothing to reproduce the data, it has no value as stored information and can just as well be reproduced on demand.

This virtualization of critical data is the key behind the increasing power of business intelligence and the star schema architecture that defines a data warehouse. The arithmetic sums that are produced by an OLAP data warehouse are produced on the fly from underlying primitive information (typically, atomic financial transactions). By “virtualizing” these arithmetic sums from the underlying atomic data, we are able to use the same underlying atomic data to produce other arithmetic sums. This is basically how we use a data warehouse: we slice the data along combinations of dimensions to produce, for example, not only earnings, but earnings by business unit, store type, or customer demographics.

We are now in the era of the ubiquitous computation engine and the cost of performing computations is approaching zero, minimizing the need to store and save the result and maximizing the value of primitive atomic data. See Banking the Past.

Thursday, July 3, 2008

3. Process-Oriented Financial Analysis

The Way [Tao] is that towards which all things flow.
Tao Te Ching, Chapter 62

Financial analysis in the twenty-first century will move from the study of static things towards the study of change and flow. This movement will follow the pattern of science as science became mathematically more sophisticated.

The ancient Greeks regarded science as a study of things. Things, in their most essential state, unaffected by changes that are temporary, were what the Greeks measured and wrote down for posterity. Stones and buildings were measured with a geometry that the Greeks became expert at and had an almost religious reverence for. Geometry, the mathematics of the Greeks, was a perfect tool for measuring things – things that were constant and unchanging.

Living things were grouped by the Greeks into species, families, and other categories according to their most essential characteristics. Just as the Pythagoreans revered geometry, Aristotle venerated categories, the catalogue of all of nature into groups.

In the same way that today’s financial information is given meaning by placing things (transactions) into categories (ledger accounts), Aristotle’s science was a primitive matter of placing everything in it proper category and then measuring the things found in those categories. For Aristotle, as it is for today’s modern financial analysis, there was no attempt to find cause and effect or correlations and indications; thins just have their proper category and follow the order that comes with those categories.

In science, however, the seventeenth century gave us Galileo Galilei, Isaac Newton, Gottfried Leibniz, and a revolution in science away from the study of change and towards the study of process. The product of this revolution was a major contributor to the modern industrial societies that we live in now. This same revolution is slowly occurring in the always conservative practice of financial record keeping (i.e. accounting and bookkeeping). The invention of the Cash Flow Statement in the 1970’s is the first major step in this direction. The use of data warehouses and more modern data analysis techniques with the computer now allow us to make financial analysis the dynamic study of movements, changes, processes, and the general flow of financial resources.

The coming years will bring us the merger of Business Intelligence (BI) and financial record-keeping into what will truly be “the language of business.” See p. 36, Banking the Past.