Showing posts with label virtual data. Show all posts
Showing posts with label virtual data. Show all posts

Friday, July 11, 2008

6. Virtual Data and Algorithms

There is no problem in all mathematics that cannot be solved by direct counting.
Ernst Mach

In the two previous posts, virtual data was described as data that was produced by an algorithm from already stored data. It is produced upon demand by the computer and its programs as opposed to being stored and maintained within the computer’s database.

Virtual data is typically the output or finished product of an algorithm that runs within an application in the computer. The computer executes the algorithm upon the user’s request, and real stored data is read and converted by the algorithm into new “virtual” data that is not directly kept in memory.

This application of an algorithm is always deterministic and therefore one algorithm along with one set of input from stored data should provide only one set of unique information. To produce another piece of virtual data, either the algorithm or the input data (or the combination of both) must change. Some variations upon this model can be realized by parameterizing the algorithm to allow it to operate with user’s options – essentially, to make it a slightly different algorithm.

The sums that are produced from a data warehouse are a particularly powerful illustration of producing virtual data from a parameterized algorithm working upon stored data. The algorithm can be considered a particular type of database query, perhaps written in SQL, and the parameters are the dimensions by which the algorithm summarizes the data. This is the most powerful form of virtual data because the dimensions can be combined in thousands upon thousands of different combinations, producing an endless variety of virtual data. The SQL query in a data warehouse can be parameterized into being essentially millions of different algorithms.

Most of the numbers that run our businesses are virtual data that can be produced by combining and summarizing the company’s simple financial transactions. The data warehouse is the factory that can produce this virtual data, and millions of variations on it, through its powerful ability to be parameterized by dimensions.

5. Virtual Data and BI

Knowledge is one of the scarcest of all resources.

Thomas Sowell

Virtual data is the reason why the star schema architecture of the BI data warehouse is such a powerful means of producing new information. Virtual data is the information that is produced on demand rather than stored in a database (see previous post). As the power of the computer creates a greater demand for virtual data, the star schema will become a greater factor in maintaining a company’s competitive advantage.

The star schema data warehouse stores primitive data, representing individual events or transactions, in a form that relates those events to the real-world objects that are they are related to. Each dimension of the star schema’s “multidimensional” architecture represents one of these real-world objects. Customers, business units, dates, and financial accounts are examples of typical dimensions stored in an effective data warehouse.

The many dimensions of a company’s data warehouse allow the user to summarize the events of the company’s history by the real-world objects that the company is related to. The company’s revenue can be summarized by business unit, customer type, product, geographical location, or any other factor that the company has deemed relevant to its analysis.

The factors (real-world objects) can be combined and correlated and the number of combinations and correlations are nearly infinite in number based upon the arithmetic of combinations (a later post).

As the typical twenty-first century business becomes increasingly information-based, its data warehouse will become a more critical factor in its ability to generate new information. The data warehouse will be able to produce more information because information will become increasingly virtual, produced on demand from the primitive atoms that represent the grist for the company’s analytical mill.

See Banking the Past.

Thursday, July 10, 2008

4. Virtual Data

The smallest operations can now afford financial control programs that account
for their finances with greater speed and sophistication that even the largest
corporations could have achieved through their production hierarchies a few
decades ago.

James Dale Davidson and Lord William Rees-Mogg
Business intelligence will become increasingly based upon “virtual data” as we proceed into the twenty-first century. Virtual data is the information that is produced by a computer from more primitive data stored in the computers’ databases.

Examples of powerful information that will soon be virtual data are the monetary amounts that move and shake our financial markets. Earnings, income, and liquidity are the critical numbers that give us a measure of the success and economic viability of a company. These numbers, typically represented as a few data points for each quarter of a year of business, are really the sums of millions of individual transactions that the company has incurred during that time period. They have been traditionally computed and stored each quarter and become the primary financial data of the company as the records of the transactions themselves are relegated to the information background.

However, these pieces of summary data will become virtual, or produced on demand, rather than primary data, for the simple reason that the production and reproduction of the data is extremely cheap and accurate with the advent of the modern computer.

The product of arithmetic operations, particularly where large volumes of data are concerned, has historically been turned into stored data because of the cost involved in manually performing these operations. However, the arithmetic that humans have produced laboriously and erratically can now be performed perfectly and effortlessly by a computer.

The inexpensiveness of performing arithmetic on a computer is the essential factor in determining when information becomes virtual as opposed to being kept as stored data. It is just a matter of fundamental economics – if it costs nothing to reproduce the data, it has no value as stored information and can just as well be reproduced on demand.

This virtualization of critical data is the key behind the increasing power of business intelligence and the star schema architecture that defines a data warehouse. The arithmetic sums that are produced by an OLAP data warehouse are produced on the fly from underlying primitive information (typically, atomic financial transactions). By “virtualizing” these arithmetic sums from the underlying atomic data, we are able to use the same underlying atomic data to produce other arithmetic sums. This is basically how we use a data warehouse: we slice the data along combinations of dimensions to produce, for example, not only earnings, but earnings by business unit, store type, or customer demographics.

We are now in the era of the ubiquitous computation engine and the cost of performing computations is approaching zero, minimizing the need to store and save the result and maximizing the value of primitive atomic data. See Banking the Past.