Tuesday, January 13, 2009

Financial Information as a Pattern


Art is the imposing of a pattern on experience, and our aesthetic enjoyment in recognition of the pattern.
Alfred North Whitehead
Much of today’s well-crafted software is designed with what are called “design patterns.” Inspired by the architectural writings of Christopher Alexander, software design patterns make computer programs more flexible and adaptable to change. As opposed to programs composed of thousands of interdependent lines of code, where a small modification to a single code may force changes throughout the rest of the program, software designed with patterns allows chunks of code, or modules, to be modified or even replaced without requiring changes to the remainder of the program. Modules become interchangeable parts representing, perhaps, solutions that can be replaced by other solutions as the program evolves and meets new challenges.

A number of common software design patterns have been cataloged into a ”language of patterns” that are shared by developers around the world, the same patterns solving similar problems in applications as varied as defense systems and retail sales. While the primary advantage of these design patterns is that their use makes software modules independent of each other, there are many different patterns that serve to solve different problems that arise throughout the software industry.

Occasionally, a particular design pattern fits a certain problem space so well that it has the potential to dominate that problem space. An example of this kind of pattern is application of the so-called “Composite Pattern” in the area of financial information. The Composite Pattern, in its very simple manifestation, generates from the primitive data of recorded bookkeeping records all of the information that is produced by a fully-staffed corporate financial department. It simplifies the whole process of accounting to the point where, once it is implemented in the corporate software, any layman would be able to generate the high-level information produced by the aforementioned accounting department.

How this done is detailed in my book, The Tao of Financial Information, however, a brief introduction can be presented here. First, let’s take a look at what the Composite Pattern is. According to Wikipedia:
The intent of composite is to compose objects into tree structures to represent part-whole hierarchies. Composite lets clients treat individual objects and compositions uniformly.
(see http://en.wikipedia.org/wiki/Composite_pattern)
Briefly stated, the Composite Pattern is a hierarchical tree of objects where higher objects in the hierarchy are “composed” of parts that are represented below them in the hierarchy. For example, as shown in the diagram above, the concept of Income that is so important to the business world is a composite of its parts, Revenue and Expenses.

What is so valuable about this rather obvious observation is that, as we can see, the weight of the higher object, Income, is a sum of the weight of its parts, the Revenue and Expenses parts. The parts are summed because Revenue and Expenses have opposing balances (Revenue is typically a credit balance and the Expenses typically have a debit balance, as reflected by the numeric signs).

In the same manner, we can place all of the major concepts in the financial solution space into an expanded form of this hierarchical tree, with each higher level concept having a balance that is the sum of its parts. This simplifies financial information into a single pattern or data structure of computer science, allowing a simple program to generate all the financial numbers automatically as the objects in the tree are updated on a real-time basis.

Thursday, November 13, 2008

Business Intelligence as a Service

“There is a service waiting to be born.”

Peter Drucker, The Daily Drucker


How does Web 2.0 fit into the growing world of Business Intelligence? This post attempts to answer this question by making two points:

  1. Business Intelligence as a Service - For most companies Business Intelligence needs to be provided as a service, with the technical aspects outsourced to experts.
  2. Web 2.0 - The best way to provide Business Intelligence as services is through secured internet pages using Web 2.0 technology.

The combination of these two factors makes Business Intelligence (or BI) a perfect opportunity for a small group of very skilled people that are able to provide what most businesses cannot provide on their own.

1. Business Intelligence as a Service

Business is an information intensive activity with a powerful competitive advantage going to those companies can best determine their customer’s needs and the means of satisfying those needs. Business Intelligence is the application of data analysis to finding the business factors that provide this competitive advantage. Business Intelligence offers insightful and timely information to a company’s strategic decision-makers. Patterns and correlations between customers, products, markets, and other information can be provided to executives that will help them allocate the company’s resources to where they can be most productive for both the company and the economy within which it functions.

BI, however, requires specialized knowledge that does not exist within most companies, even including those companies that can afford their own sizeable IT staffs. How to structure a data warehouse, create data mining models, or using statistical techniques to classify customers or predict sales are skills that are rare in the typical IT staff and require a large amount of specialized training to grow within the company. To date, most BI ventures have either failed completely or have proven to be disappointing in the amount of information produced and the expenses incurred.

In addition to the inherent difficulty in producing a BI staff, there is the problem of scale that limits the development of BI to only those companies that are large enough to allocate an enormous budget to produce an automated extension to their enterprise decision makers. While the international conglomerate can afford to throw millions of dollars at a BI effort, the vast majority of businesses will find an in-house BI effort prohibitive.

According to Adam Smith, the “wealth of nations” was primarily a resultt of a division of labor that allowed the baker to bake, the painter to paint, and the pin-maker to produce pins. The production of Business Intelligence, while requiring a commitment from the business man, requires an expertise that is affordable only to the man of leisure, the corporate executive with an infinite budget, or the business person who is resourceful enough to allocate the effort to an expert service. As smaller firms began to see the gains made by their bigger competitors, they will need to turn to a firm that can provide BI as a service.


2.
Business Intelligence and Web 2.0

A service company needs to deliver a service, and in the case of Business Intelligence, the service to be delivered is information. The consumer needs to receive information in a manner that it will provide him insights to his business. In addition, the consumer needs to deliver raw data, representing the activities of the company, to the service-provider to be stored in such a manner that it can easily be turned into intelligence. If Business Intelligence is to be provided as a service, the service needs to be able to communicate data to and from the service consumer and it must be able to present the data in a manner that is user-friendly, efficient, and intuitive.


The communications requirement immediately suggests the internet. By way of the internet, data can be both uploaded to or downloaded from a service-provider’s database. There is no need for a dedicated network; the user only needs to access the database through an internet address and whatever security is needed to protect the data’s privacy. The URL is the application-download location, and the browser is the application platform. With the internet, the complete process of storing and reading data in an outsourced database is easily resolved. Users and developers have embraced the web because it offers centralized management and delivery, no local installation, instant updates to both functionality and content, and elements such as shareable bookmarks.


The problem with the internet is the classic render-call-render pattern that dominates most web applications. The traditional way of providing an interactive web site is to enter data in a page and wait for the page’s host to return a new updated page. This lacks the smooth interaction that a user has with an application that is running on his local machine. On his local machine he is able to enter data and have the program respond without rewriting the screen; the application is working in an interactive manner directly with the user and there is no delay time while a page and its bulky graphics are retransmitted across the internet. Conventional web applications solve the problem of data communications but leave the problem of a clumsy interface that interrupts the users thought and confronts him with interactive ambiguity.


This brings us to Web 2.0 and its ability to turn an internet browser into a virtual desktop. Web 2.0 teams the experience of interactivity with the underlying connectivity that comes from the internet. According to Tim O’Reilly, “Web 2.0 is the business revolution in the computer industry caused by the move to the internet as a platform, and an attempt to understand the rules for success on that new platform.”


At the center of the Web 2.0 phenomena are the interactive technologies AJAX, Adobe Flash, Flex, and Java. AJAX, for example, has prompted the development of websites that mimic desktop applications. Within a page based upon AJAX, there is embedded JavaScript code that can actually operate as a full-blown application, giving the user the same experience that he gets from word processing and spreadsheet applications that sit on his own machine.


With the combination of expert Business Intelligence designers combined with the technology of Web 2.0, there is a service out there waiting to be born.

Friday, November 7, 2008

The Universal OLAP Query

One of the impediments to providing Business Intelligence to management is the technical expertise required to make a request to the data warehouse. A busy executive cannot be expected to be able to use a database language like SQL and must therefore depend upon standard rigid reports from OLAP reporting tools. In this post, I will demonstrate how easy it is to provide a technically untrained executive with a graphical tool that will allow him to generate millions of combinations of queries to standard data warehouse.

The challenge appears difficult because, not only is SQL a technically complex language based upon mathematical set theory, but the SQL queries that go to a database designed as a data warehouse are particularly complex to code. However, within these challenges lies the very solution to providing real data warehouse power and flexibility to the non-technical user.

Every request for data from an orthodox data warehouse is complex but exactly the same in form—only the parameters change in a statement that must have the same components in the same places to utilize the analytical power of the data warehouse. The vehicle must be the same while the options differ depending upon the specific questions asked by the analyst. Let’s dissect the standard data warehouse query and factor out the generalities from the particulars of a specific query and thereby determine what are the parameters are to a “Universal OLAP Query.”

First of all, data warehouses have a particular architecture to their data that remains the same whether we are using a specialized “multidimensional database” or a standard relational database designed to implement a multidimensional “star schema.” This architecture provides the analyst with a set of facts that can be grouped and summed according to the dimensions of the facts. For example, where a company keeps a data warehouse of retail sales, the design of the data warehouse would appear as follows where the blocks are tables (a single “fact” table that is linked to four “dimension tables” that contain the sets of objects that the retail sales are related to):

Although the dimensions are specific to this data warehouse, the structure of a fact table attached to a set of dimensions remains the same in any orthodox data warehouse, accounting for its description as a “multidimensional database” or a “star schema.” Much of the complexity of an OLAP data query is a result of the joins that must occur between the Fact table and one or more of the dimensional tables. The solution to creating a universal OLAP query is automate this join process and provide the non-technical analyst with a user interface that allows him to simply select the facts and dimension that those facts will be rolled up by. Know this, we can see that the star schema query will always appear in the following form:

select Fact, Dimension Criteria
from Fact Table, Dimensional Tables
where Join Criteria, Slicing Criteria
group by Dimensional Criteria

This structure remains the same for each OLAP query, the things that change from query to query are the underlined terms that we can turn into parameters to our universal OLAP query generator. Reviewing these terms in the above example gives us the following:

This reduces the syntax of SQL (or some other technical language) to a set of parameters that can be chosen from a graphical user interface that uses drag-and-drop or some other selection process for our non-technical user, however, there are a lot of parameters here and we can reduce them somewhat. The identity of the Fact Table can be programmatically inferred from the selection of Facts, eliminating it as a parameter. Likewise, the Dimension Tables and the Join Criteria can be inferred from Dimensional Criteria and the Slicing Criteria (We assume that the Universal Query Generator has been initialized with the metadata that describes the data warehouse). This reduces the input to the generator as follows:

A user interface that has been configured with the description of the data warehouse can then provide a simple wizard that allows the user to select the facts that he wants rolled up (summed, averaged, etc.), the attributes of the dimensions by which the data will be rolled up (for example, customer age group, product color, time periods, etc.), and the limits of the data (slicing criteria). The Facts and Dimensional Criteria are simple selection operations, while the slicing criteria can involve simple to more involved logic as data is sliced with “and” and “or” conditions. However, the whole thing can be done with an intuitive user interface that requires no technical language skills.

Thursday, November 6, 2008

Drucker on Integrating Decision Support Data

“…most businesses have two information systems. One is organized around the data stream; the other, far older one, around the accounting system. The accounting system, though, is a five-hundred-year-old information system that is in terrible shape. The changes we will see in information technologies over the next twenty years are nothing compared to the changes we will see in accounting.”

Peter Drucker, Management in the Next Society

The heart of Business Intelligence is the data warehouse, with its historical record of the business events. It typically includes data about customers, products, business units, and other classes of objects that are critical to the company’s decision-making. Unfortunately, it has not included the most critical dimension of the business event—the movement of financial resources that occur during that business event.

This data, historically the most important of all business information, has been sequestered from the world of Business Intelligence by accountants. The accountants use this data to produce reports that satisfy government regulations while providing almost no strategic assistance to the company’s management. According to Peter Drucker, decision support will have to include the transactional data that is produced by the accounting department.

“At present, these two systems [data stream and accounting] are separate. They will not be separate for our grandchildren’s generation. Today’s CEOs still depend on the accounting model. I don’t know of a single business that bases its decisions on the data processing stream. Everyone bases their decisions on the accounting model, even though most of us have learned how easy this model is to manipulate.”

Peter Drucker, Management in the Next Society

Drucker is right; what we call Business Intelligence will have to be integrated with accounting data. It simply has to; the competitive advantages of doing it are too great. More importantly, the competitive disadvantages of not doing it will make it essential.

Thursday, October 16, 2008

8. Data Warehouse Architecture

Data warehouses are almost universally designed as multidimensional databases, or, in relational database terms, as star schemas. The problem with both of these terms is that they are overly technical terms to the targeted users of business intelligence. This is unfortunate because the underlying concepts that they represent are relatively easy to understand for the technically naïve and very empowering for those willing to grasp their essence.

In an attempt to make the architecture of the standard data warehouse comprehensible to a wider audience, I submit the following:

1. A data warehouse is not really a multidimensional structure in any true geometric sense. Rather, it is simply a collection of facts wherein each fact is associated with a set of objects that related to it. True, it can be implemented as an array of many dimensions, but more generally, it is just a collection of facts and their material associations.

2. The star schema is a pattern of database design that has a core table of “facts,” or events that are historically recorded with their relations to business objects. Each category of business object (i.e. customer, product, business unit) is stored in a separate table that is connected to the fact table and is referred to as a dimension. The dimensional tables have no geometric significance, but rather, they represent collections of objects to which the facts relate to. Although a star schema can be thought of as a dimensional space, with each category of objects representing a dimension, it can also be simply thought of as a representation of historical events with each event being defined by a set of factors, each of which is represented by one of the dimensional tables.

3. Per the previous two observations, although the star schema is represented diagrammatically as a star-like structure of tables, with the fact table in the center surrounded by tables that represent the factors (dimensions) which define the facts, these same diagrams can be represented, with equal accuracy, as a table that is simply related to a set of factors. Above are two alternative diagrams for the same data warehouse. The implementation of the data warehouse remains the same, only the geometry of the graphic has changed. As shown in the above diagram, a star schema can be viewed as a table of facts and a set of tables that are used to factor the facts.


Tuesday, October 14, 2008

7. Unified BI

For a large corporation to have a reliable business intelligence (BI) initiative it needs to have a way of integrating information from various parts of the company. The company needs what is called a “single version of the truth.” Various information architectures have been designed to support the idea of a unified BI, however, none have proven to be a real solution to a growing problem. The question to be answered in this post is:

How can a company unify its various data warehouses, CRM databases, and other elements of a BI environment?

There is only one solution and, luckily, it is a simple one. The unifying principle of BI is to have universal object identifiers. The company unifies its BI environment by providing unique identities for each and every significant object in the BI environment. Allow the various business units within the company to evolve their own data mart with their own architectures as long as they provide the same unique, company-wide, identification for each object in their database.

By objects, I mean those identifiable entities that are relevant to the business, including the customers, business units, and products that are involved in the day-to-day transactions of business. Different departments of the company, each trying to satisfy its own business needs, may keep different data about these business objects, but the data can be integrated and shared among the departments as long as the customers, business units, and products have a common company-wide identity throughout the company.

The primary class of objects that is of interest to the company’s business, however, is the transaction itself. Each commercial transaction, representing the sale of item, the payment of an expense, or the trading of assets, is an object that must have a unique identity, produced as it occurs to the business. With each transaction being assigned a unique identifier at the time of its creation (i.e. a point of sale), the information in a marketing data warehouse can be integrated and analyzed with related information in the finance, sales, and customer relations departments.

Universal object identifiers are the key to a unified BI, providing the company with a version of the facts that transcends the boundaries of various departmental boundaries. From a single identity of for each fact comes a single version of the truth.

Friday, July 11, 2008

6. Virtual Data and Algorithms

There is no problem in all mathematics that cannot be solved by direct counting.
Ernst Mach

In the two previous posts, virtual data was described as data that was produced by an algorithm from already stored data. It is produced upon demand by the computer and its programs as opposed to being stored and maintained within the computer’s database.

Virtual data is typically the output or finished product of an algorithm that runs within an application in the computer. The computer executes the algorithm upon the user’s request, and real stored data is read and converted by the algorithm into new “virtual” data that is not directly kept in memory.

This application of an algorithm is always deterministic and therefore one algorithm along with one set of input from stored data should provide only one set of unique information. To produce another piece of virtual data, either the algorithm or the input data (or the combination of both) must change. Some variations upon this model can be realized by parameterizing the algorithm to allow it to operate with user’s options – essentially, to make it a slightly different algorithm.

The sums that are produced from a data warehouse are a particularly powerful illustration of producing virtual data from a parameterized algorithm working upon stored data. The algorithm can be considered a particular type of database query, perhaps written in SQL, and the parameters are the dimensions by which the algorithm summarizes the data. This is the most powerful form of virtual data because the dimensions can be combined in thousands upon thousands of different combinations, producing an endless variety of virtual data. The SQL query in a data warehouse can be parameterized into being essentially millions of different algorithms.

Most of the numbers that run our businesses are virtual data that can be produced by combining and summarizing the company’s simple financial transactions. The data warehouse is the factory that can produce this virtual data, and millions of variations on it, through its powerful ability to be parameterized by dimensions.