Friday, July 11, 2008

6. Virtual Data and Algorithms

There is no problem in all mathematics that cannot be solved by direct counting.
Ernst Mach

In the two previous posts, virtual data was described as data that was produced by an algorithm from already stored data. It is produced upon demand by the computer and its programs as opposed to being stored and maintained within the computer’s database.

Virtual data is typically the output or finished product of an algorithm that runs within an application in the computer. The computer executes the algorithm upon the user’s request, and real stored data is read and converted by the algorithm into new “virtual” data that is not directly kept in memory.

This application of an algorithm is always deterministic and therefore one algorithm along with one set of input from stored data should provide only one set of unique information. To produce another piece of virtual data, either the algorithm or the input data (or the combination of both) must change. Some variations upon this model can be realized by parameterizing the algorithm to allow it to operate with user’s options – essentially, to make it a slightly different algorithm.

The sums that are produced from a data warehouse are a particularly powerful illustration of producing virtual data from a parameterized algorithm working upon stored data. The algorithm can be considered a particular type of database query, perhaps written in SQL, and the parameters are the dimensions by which the algorithm summarizes the data. This is the most powerful form of virtual data because the dimensions can be combined in thousands upon thousands of different combinations, producing an endless variety of virtual data. The SQL query in a data warehouse can be parameterized into being essentially millions of different algorithms.

Most of the numbers that run our businesses are virtual data that can be produced by combining and summarizing the company’s simple financial transactions. The data warehouse is the factory that can produce this virtual data, and millions of variations on it, through its powerful ability to be parameterized by dimensions.

No comments: