|Covering Business Intelligence, Integration & Analytics||Advanced Search|
"Secure, accurate, consistent, transparent and scalable," if you're a fan of Jeopardy you might respond, "What is Sarbanes-Oxley?" But the right answer is, "What words would never be used to describe multi-user spreadsheet applications?"
Sarbanes-Oxley specifies stiff penalties for public company executives that inaccurately report business performance data. Executives are now required to certify their organization's compliance with standards ensuring the accuracy, consistency and transparency of public financial disclosures. It stands to reason that these executives will soon expect the same level of compliance for all internal reporting, analysis and planning systems. This will be increasingly difficult in organizations that have become reliant on large-scale spreadsheet applications.
Spreadsheets result in a proliferation of desktop databases. Security is near impossible when portions of sensitive data are replicated across hundreds of desktops and laptops. Spreadsheets are prone to error, largely undocumented and seldom rigorously tested. Linking large numbers of individual spreadsheets is difficult. Transparency, the ability to drill from summary performance levels to detailed levels, is also very difficult. As data volumes continue to grow, spreadsheets fail to provide the capacity for meaningful analysis. Relying on spreadsheets is putting careers in jeopardy.
Breaking the Spreadsheet Habit
It is easy to be critical of spreadsheets, but the fact remains that users rely on them. Spreadsheets provide users with predictive and interactive what-if analysis that is essential in predicting and assessing the impact of planned decisions. Spreadsheets are used to continuously update the tactical plans, forecasts and budgets that are essential for implementing decisions.
Users will resist all attempts to eliminate spreadsheets, taking away a basic freedom. What's required is a better spreadsheet, one that facilitates centralized administration but still provides users with flexibility to direct their analysis within appropriate bounds.
Breaking the spreadsheet habit requires delivering more functionality along with better controls. The familiar row-and-column data display of the spreadsheet is critical but should be accessible via a secured Internet connection. Access to all relevant performance data should be available, without the requirement to download sensitive data to the users PC. The result is an on-demand spreadsheet.
Users must be able to perform what-if analysis in developing plans, forecasts and budgets. However, these performance targets should be easily consolidated for collaboration, approval and communications steps that are part of a business process. Rather than simply providing users with a desktop tool, the goal is to integrate spreadsheet-like analytics into business processes. Eliminating desktop databases, not analytic functionality, is the requirement.
The Foundation is the Data Warehouse
Earlier, it was noted that the words, "secure, accurate, consistent, transparent and scalable" would never be used to describe large-scale spreadsheet applications, but these are words that do apply to data warehouses. The enterprise data warehouse provides "one version of the truth," which is the "consistency" that is a requirement for Sarbanes-Oxley compliance. The proliferation of spreadsheets has resulted in thousands of individual interpretations and a lack of consistency.
"Secure, accurate and scalable" are also words that describe data warehouses. "Transparency" is another requirement, which in data warehousing terminology is the ability to report and analyze data across dimension and the ability to drill up/down dimensional hierarchies. By their nature, data warehouses provide transparency, allowing users to move easily form summary levels to granular levels.
As data warehousing proponents, we understand the issues related to data integration. But, most of our energies have focused on consolidating and aligning data from multiple operational systems. It is time to focus on integrating data generated within users' applications. This is the closed-loop decision support that is described as business performance management (BPM), continuous planning, forecasting and budgeting linked to the data warehouse.
The challenge that must now be solved is maintaining all of the characteristics of the data warehouse, when users are interactively analyzing data and creating new sources of data that must be added to the data warehouse. In other words, applications must maintain the referential integrity of the data warehouse.
Spatial Referential Integrity
The power of the spreadsheet is its ability to maintain referential integrity based on spatial relationships - the cell addresses that are the result of row/column locations such as "A1" or "BB258." Formulas are defined using these spatial references, and the integrity is maintained as new row and columns are inserted into the spreadsheet application. This provides users considerable flexibility in expanding their analysis. However, all of the data and the formulas must reside within the spreadsheet and, therefore, the user is responsible for managing cell contents, a desktop database.
Database Referential Integrity
Managing referential integrity within the data warehouse is critical as well. As data warehouses are updated with the most current data or new data sources are added the integrity of the data warehouse must be maintained. Applications that rely on the data warehouse, define formulas based on the location of data within the data warehouse.
Unlike spreadsheets, applications and their underlying computational logic are easily shared among large user communities because referential integrity is maintained at the database layer rather than the application layer.
It is not a significant challenge to add data generated by applications to the data warehouse if the data generated by the application is consolidated and written to the data warehouse as a scheduled update. This data is simply another data source that is integrated within the dimensional hierarchies of the data warehouse. However, it is impossible to support even a modest number of users interactively analyzing data (changing inputs for what-if analysis) within a relational database management system. The need to maintain database referential integrity with each new input would not be possible with relational database management software.
Data Marts and Data Caches
The most popular way to overcome the limitations of the data warehouse is to create a data mart, a specialized database sometimes referred to as a multidimensional cube. These specialized database management systems are an integral part of the application layer and as a result can support iterative read/write interactions required to develop analysis and planning applications. Like the data warehouse, data marts are persistent data stores with centralized administration, certainly a better solution than user-administered desktop databases.
Because data marts are persistent data stores, they also are redundant if sourced from the data warehouse. The integrity of data stored in redundant databases is an administrative challenge. Remember the goal is to provide secure, accurate, consistent, transparent and scalable with one version of the truth.
Plans, forecasts and budgets stored in these data marts often never find their way back into the data warehouse. Furthermore, it is not unusual for data marts to contain performance data that is not in the data warehouse. Without care, data marts quickly turn into the application silos that data warehouses were intended to replace.
An alternative to creating a persistent data mart is to temporarily cache data to support read/write access and write the results created in the cache back to the data warehouse as a scheduled update. This links the application to the data warehouse. One approach to caching data for interactive analysis involves the creation and combination of temporary caches to provide shared workspaces for collaboration and individual workspaces to support users' analysis. The key difference between this approach and data marts is that the data created by the application - plans, forecasts and budgets - are written and maintained in the data warehouse. Data created in the cache is added to the data warehouse as a scheduled update consistent with a defined business process.
The essential analytic requirement of an on-demand spreadsheet is the ability to support what-if analysis that is common to spreadsheet applications. But unlike spreadsheet applications, formulas for computing these measures should be centrally maintained and applied consistently across all applications. So, in addition to a centrally managed data warehouse, the applications link to a common analytic library.
Users also must be able to extend their analysis, initiated at one level in a dimensional hierarchy, by drilling up/down dimensional hierarchies. This requires that data entered at one level be automatically propagated at every level in the multiple dimensional hierarchies. This is the equivalent functionality of a spreadsheet capable of managing hundreds of millions of rows.
Maintaining referential integrity of unit volume measures within a multidimensional database is a first order problem. A change such as adding 100 units to the forecast of a product in the Eastern region, means that 100 units also has to be added to the total U.S. forecast and 100 units must be distributed among the territories and markets within the Eastern region. Within a multidimensional database a single change often requires thousands of adjustments to maintain referential integrity.
The second order problem is the need to integrate volume forecasting and financial modeling. This is providing users with the capability to simultaneously model volume and financial variables. Many financial variables, such as product price, are reported as weighted averages at every level above the lowest level in the multidimensional hierarchy. For example, a price of $1.97 reported at the U.S. level is the weighted average of prices in stores. The same is true for variable costs and marketing spending programs that are volume related. Allowing users to input a financial variable at any level in the multidimensional hierarchy adds to the complexity of maintaining referential integrity.
The third order problem is encountered when users are provided with the ability to lock values within the hierarchy. Cell locking is important in performing what-if analysis to optimize performance within boundaries of known constraints - market growth constraints, capacity constraints, product discontinuances, etc. By introducing locks into the analysis, users can model the effects of cannibalization on both volume performance and financial performance.
These analytic capabilities are essential and are complicated by the scale of the data warehouse.
Content and Application Interface
"Content" is the collection of scorecards, alerts and reports. The requirement at the content level is to integrate standard reporting with a spreadsheet-like interface to support interactive analysis and the sharing of information developed within the application. In this case, users need the ability to build their analysis from the pieces - stored data, their inputs and resulting computed measures. Content is the available data that can be included in the analysis at the user's discretion, rather than a predefined report format. For example, the user may wish to include fuel costs in forecasting profitability. If fuel costs are not in the database, the user should be able to enter this data and model the impact of higher fuel costs.
The requirement for the application interface is it must be accessible from an Internet browser. There should be no requirement to download data to users' desktops to support flexible user-initiated analysis.
Spreadsheet applications and desktop databases jeopardize organizations' ability to provide accurate, consistent and transparent business performance management applications. This represents both a significant opportunity and a challenge for data warehousing. Data warehouses are chartered with providing one version of the truth. The new measure of success for data warehousing should be the reduction in the number of desktop and laptop databases that are required to support analysis.
Maintaining referential integrity of the data can be addressed either by creating data marts or by caching data temporarily. In either case, security, accuracy, consistency transparency and scalability are the critical requirements to avoid spreadsheet jeopardy. This is not a game that managers want to play in an era of increasing regulations and scrutiny.
Richard Tanler is the founder of eiVia Inc., the leader in providing technology that the company refers to as an ON-Demand Spreadsheet. Tanler was the founder of Information Advantage, Inc. a leader in the business intelligence market (acquired by Computer Associates) and is the author of the Intranet Data Warehouse (John Wiley & Sons) and numerous articles on business intelligence and data warehousing. You can contact him at email@example.com.