My book describes how the convergence of new web-based architectures, advanced object-oriented methodologies, and powerful computing architectures can create business results for your organization. However, even with these enabling technologies, the construction of the data warehouse remains a challenging undertaking. Success requires both a capable team and a group of users willing to change their daily activities. (For the Oracle - Business Objects users, this text is a must-have, see the next-to last paragraphs of my comments).
Success for your organization means improving the quality of what your team does. To improve your project's productivity and success, I describe how advanced visualization and modeling capabilities from object oriented analysis design components of a data warehouse. In addition, I use the unified modeling language (UML) to detail the steps of the of the data warehouse method for both data modeling and data acquisition. In my chapter on Design, I show how UML can address many dimensional modeling issues that were impossible to solve with the entity relationship diagram (ERD). The DWM presents a complete solution that utilizes the Oracle 8.0 RDBMS as the data source. The DDL and sample data for the problem is included on the CD.
The DWM improves the success of the project, by incrementally breaking the line-of-business into cycles that implement business-models. This technique incorporates scaleable design techniques, including data partitions, and delivers short-term business results and insures that the project cycle built today will be reusable by those next waiting to roll out of the factory. By building focused, business-model based data marts at three-six month intervals; the DWM reduces the time required to deliver business results for your organization.
The data warehouse project can be very risky. According to the META group, after one year more than 50% of data warehouse projects have failed to achieve their objectives. Another study of large corporations attempting to construct large-scale data warehouses reports more than 80% of all data warehouse projects fail to meet organizational objectives, with a significant portion in complete failure. The process of acquiring data from operational systems, transforming it, and loading it into the data warehouse can be a fundamental cause of a project's failure. The prevalent use of pre-defined star schemas, or 'by the book' solutions, may delude the project team into thinking that the organization's operational systems will (easily) support the data model. I have often found that some project teams don't even attempt to load operational data until late in the project. Until this loading takes place, the project team cannot truly evaluate how well the business rules model of the operational system matches the design.
In the DWM, data acquisition is a critical component of the process. Very little has been written on this topic. Therefore -- at each step in the process, I show how early data prototypes and extracts from operational systems are critical to the success of the organization. The team that applies the object-oriented analysis in this book will improve the efficiency and effectiveness of the integration between the data warehouse and the operational systems. This method eliminates serious project risk by moving operational data from source system(s) very early in the project cycle.
Eventually, every data warehouse manager must seek out ways of improving the performance of the mature environment. In latter chapters of the text, I develop the concept of aggregate management. Through this approach, pre-summarized subsets of fact tables are precisely configured to dramatically improve the performance of the managed query environment. The CD included with this text contains executable source for an 'Aggregate Wizard'. This program merges the semantic and CASE repositories to provide an important service to the users, a highly performing, highly available data warehouse. It specifically works with the Oracle and Business Objects environment and utilizes data structures that can be incorporated into the universes with the @aggregate_aware function. I have included the source of the processes and data schema so that readers may develop their own aggregate solutions.
One of the key reasons that I wrote this text is that I have observed an evolution in the use the CASE, semantic, and administration repositories. To me, this suggests new types of methodologies will be emerging from higher abstractions in the construction of an IT infrastructure. There are very few in this profession that would build systems the same way they did 10 years ago. The current IT infrastructure has been built with neither "architectural" nor business concepts. Additionally, their operational systems often lack the current principles of management and industrial engineering. Most systems have been built with the outward manifestations of the outmoded practices. Businesses trying to operate in this environment are finding that consolidations and mergers are reaching the limits of productivity gains. It will soon be time for information technology to be a fresh source of productivity gains. In a closed loop, new methods, in combination with the data warehouse method, will have the ability to deploy new systems that marry existing business models with the fine tunings of a data warehouse analysis. To survive today's changing, chaotic environment, elements of the new operational systems will need to enable a 'zero cost' deployment of new business rules.
From the Inside Flap
FOREWORD
Truisms abound in information technology. The project is perpetually over budget, behind schedule, mired in requirements, and a victim of politics. In response, I have developed my own truism: six of the right people can do more than sixty. I have worked on many IT projects, large and small, and have experienced both truisms. I have found that quality is the single point of failure for any project, the quality in the gap between what a team could and did achieve.
The theme of this text is quality processes for the data warehouse. I view this opportunity as a chance to state not what is but what could be. Methodology should control the process of creation. It should be planned with project management and created with discipline.
The data warehouse project must contribute to the performance of an organization. This performance should be measurable. The time to integrate for its own sake has passed. It is time for each organization to examine its IT development processes and find which contribute to the health of the organization and which do not.
For the manager or executive, the methodology that I describe is constructed to meet the strategic objectives of your organization and create a measurable result. Most IT objectives are accomplished in cycles. I suggest that you should use the project cycle to achieve business results for your organization.
Not every project cycle will be strictly for new developments. Your infrastructure must be maintained. Upward moving events and technology requires re-hosting and re-scripting operational systems. The Intel-based work station and operating system has a very short productive life. Because most organizations have not made this distinction, a challenge for senior management is to separate the infrastructure from the business results projects.
The recent business process reengineering fad, while somewhat defunct, was useful in pointing out the age of the processes that are ingrained in today's operational systems. Many legacy systems were implemented by moving paper-based processes onto databases and screens. Somewhere in the jumble of prompts and fields resides the business knowledge of the enterprise. With or without, the enabling technology of data warehousing, the managers make decisions that keep the business afloat. Or not.
Many technical books are written to fill a basic human need: to make the impenetrable understandable. With good prose, just about any technical topic can be illuminated. From subatomic physics to the World Wide Web, there are books that beautifully explain their topics; however these are not more likely to make the reader a physicist or even a database designer.
To build the data warehouse requires a broad range of technical disciplines. Adding maturity and capability to the data warehouse team requires stretching their capabilities and challenging each member to grow. Despite the best efforts of the self-help guides, the construction of the data warehouse remains a challenging undertaking. Success requires both a capable team and a group of users willing to change their daily activities.
At the heart of the management environment is the data warehouse of the discipline of quality. Taken in isolation, quality is the gap between capability and performance. Quality is either high, with a minimal gap, or low, with larger gaps. A quality data warehouse serves the strategic intent of the organization, is created with the best available data, and is achieved at an optimal rate. Both the data available and the rates of implementation are highly dependent on your organization. If your organization has older, less integrated systems and less technical acumen you still achieve a quality data warehouse by promoting consistent methods in its creation.
Collectively, the methods that I discuss in this book enable the implementation and maintenance of a quality environment. It is intuitive that the strategic directions taken in the early phase of a project will sway the technical architecture and ultimately the quality and the performance of the system. Beyond a discussion of the activities and personnel that create the data warehouse, there are technical design approaches that should be taken in order to create a high-performance data warehouse. Since my audience in this text are the project implementers, I will need to be very explicit in my description of these technical implementations.
In choosing to describe the specific nature of integrated environments, I can focus on how the environment can be integrated and managed to provide a true solution. My discussions include UNIX Servers, relational database management systems (RDBMS), and several managed query environments. I hope that by diverting the focus from a generic attempt to providing specific solutions that a model of the characteristics and capabilities of the quality data warehouse environment will emerge.
Reuse is an object-oriented concept that makes the efforts of one project available to another; however, it's the use of the product, not it's features that promote this. Often, I find today's corporate environment a dizzying array of software products with similar capabilities, many of which are object-oriented (OO). On more than one occasion, I have been astonished to discover multiple computer aided software engineering (CASE) tools, multiple user interfaces, even multiple on line analytical processing (OLAP), and managed query environments (MQE) on the same IT shop floor.
For the past decade, vendors of enterprise products, including CASE and RDBMS products have been sold as a method of unifying operational systems across lines of business. Their ubiquitous argument has been that the legacy of the mainframe is a series of non-communicating, outdated systems. The parallel component of the marketing assault is the position that their tools have