-
Marketplace
-
Channel Resources
Articles from this Site
What are your views on the advantages and/or disadvantages ETL tools and data modeling versus code?
Where is the best place for a BI application to reside?
Are there best practices or tools for creating a data dictionary?
When do you use a star schema and when to use a snowflake schema?
What are dimension mapping and fact mapping?
White Papers
Data Warehouses: What are they and how will they benefit your organization?
Advances in Data Warehouse Performance
Books
How is an information warehouse project different than a typical project for a transaction system?
Q: How is an information warehouse project different than a typical project for a transaction system. Additionally, how would the required skill set for an information warehouse architect differ from an enterprise data architect?
Larissa Moss' Answer:
A typical project for a transaction system has well defined requirements, a specific set of users, requires traditional skills and is usually delivered on known and proven technology platforms. Therefore, it can be organized and managed with a traditional waterfall methodology (although in my opinion all projects should convert to agile methodologies). On data warehouse projects, the requirements are usually a "moving target," the scope is usually too large for the deadline, data integration and data quality prove more difficult than expected, the staff often lacks business intelligence (BI) skills, communication between staff members takes too long, the tools do not live up to their expectations, and the roles and responsibilities assigned in a traditional way seem to result in too much rework. Therefore, a traditional methodology and a traditional project management approach does not work for controlling data warehouse (DW) project activities. An agile and spiral methodology is needed that is based on "extreme scoping," i.e., delivering an application in a series of short releases while "refactoring" the application deliverables (similar to the XP approach). This type of methodology requires a different project organization, such as a small self-organizing core team, shifted and shared roles and responsibilities, direct end-user involvement, etc. (Reference: Business Intelligence Roadmap by Larissa Moss and Shaku Atre, Addison Wesley, 2003).
Now to the second question. The term "architect" means different things to different people. Since the late 70s/early 80s, when data modelers started to use that term, data architects, especially enterprise data architects, were data administrators (DA, also known as IRM, DRM, EIM) who were formally trained in entity-relationship modeling, normalization, and data administration (DA) disciplines (taxonomy, generalization and specialization, abstraction, semantics, data lineage, naming standards, standards for business rules and data definitions, etc.). In companies that still adhere to the DA principles and DA disciplines (Reference: DAMA International, www.dama.org and books by Michael Brackett) or enterprise architecture (EA) or the Zachman Framework (Reference: John Zachman, www.zifa.com), enterprise data architects model the data from a business perspective (not a database design perspective). The term data warehouse architect (and sometimes the term "data architect," without the "warehouse" to make it really confusing) is used by database administrators or database designers, who create the schemas for DW databases. These schemas can be more or less denormalized "relational" structures (based on the original business view of an entity-relationship model) or, in most cases, are multidimensional structures, such as star schemas or snowflakes. In either case, these physical data models (aka logical database design schemas) are representations of database structures and not highly normalized business models.
Clay Rehm's Answer:
The end product of a typical transaction system is either a system of computer screens or batch reports (or both). The end product of a typical data warehouse is data! This means the users of a data warehouse must be much more involved with the end product including how they will write queries now and in the future. In a way, the end product of a data warehouse is much harder to visualize for most business people, and extra care is needed to educate people on data and concepts instead of how to navigate through a series of screens.
This means a DW project relies on people skills much more so. This includes most of the development team including the DBA. Where a DBA on a typical transaction project may never see a business user, on a DW project they will. For example, the data design team (data modelers and DBAs) have to be flexible to allow designs that may violate typical normalization forms. They need to understand from a business perspective how to load, store and retrieve the data in the easiest way possible.
The development team has to understand the business and business rules much more since the end users will need help in writing queries against the data warehouse. Not only will they need help, most likely the development team will be writing the initial queries and reports.
Anne Marie Smith's Answer:
Transaction processing systems are designed to capture business transactions, and the data relevant to them. The TP system is concerned with adding, updating, and deleting data, the same data, millions of times. Their primary function can be thought of as getting transaction data into the information system quickly and accurately for clerical usage. The purpose of the data warehouse is to provide information for managerial decision making that is distinct from, and does not interfere with the performance requirements of the OLTP systems. This is usually done by re-engineering transaction system data around business subjects (for example Inventory), and making it easily available to the managerial community in a separate database. The role of an architect in a data warehouse project requires that they understand the architectures of the source systems, the needs for extraction, transformation and loading of relevant data from the sources to the target, the architectures of a data warehouse that would support the business' needs for analytical data, and whether further architectures are needed for this particular data warehouse (operational data stores, data marts, etc.). Therefore, the skill sets for an information warehouse (or data warehouse) architect would include the skills for a traditional data architect with an understanding of the enterprise's data delivery needs. Many organizations use an enterprise data architect as their data warehouse architect, since an EDA has a higher level of skills and knowledge than a traditional data architect.
Adrienne Tannenbaum's Answer:
Transaction systems are more concerned with the support of a process or processes, and the data is an "also ran." Information warehouses on the other hand are more concerned with the data that results from these processes, with their creation processes being "also rans" to some degree.
When information is being organized for a transaction system, its format and database design are typically set up to efficiently result from or go into the transaction system, the transactions determine what is required. When information is organized in a warehouse, it is organized to be a fuel for reporting and analysis. Some of the warehouse data is already summarized, some represents a different perspective than that of its original source. Any warehouse-related "processing" of information data happens during the data's travel into the warehouse and is targeted toward the information's usage in combination with another set or sets of related information.
There are some similarities in the skills required to design and develop both types of information stores (requirements gathering, systems analysis, logical and physical data modeling, systems development). What varies between the two worlds is "perspective." Requirements gathering for a warehouse focuses almost entirely on the reporting needs of the target user community. Database design for a warehouse is dimensional. Data warehouse architectures include reporting and BI tools ... hope this helps.
Anne Marie Smith is a highly acclaimed author and speaker in the fields of data stewardship, data governance, data warehousing, data modeling and metadata management. She holds a doctorate in Management Information Systems and has taught at LaSalle University. Smith serves on the board of directors of DAMA International and is an expert advisor to DM Review's Ask the Experts. Smith is the director of education at EWSolutions, a GSA schedule partner and systems integrator dedicated to providing companies and government agencies with best-in-class business intelligence solutions using data warehousing, enterprise architecture and managed metadata environment technologies (www.EWSolutions.com). She may be reached directly via email at AMSmith@EWSolutions.com.
Larissa Moss is founder and president of Method Focus Inc., a company specializing in improving the quality of business information systems. She has more than 20 years of IT experience with information asset management. Moss is coauthor of three books: Data Warehouse Project Management (Addison-Wesley, 2000), Impossible Data Warehouse Situations (Addison-Wesley, 2002) and Business Intelligence Roadmap: The Complete Project Lifecycle for Decision- Support Applications (Addison-Wesley, 2003). Moss can be reached at methodfocus@earthlink.net.
Clay Rehm, CCP, PMP, is president of Rehm Technology (www.rehmtech.com), a consulting firm specializing in data integration solutions. Rehm provides hands-on expertise in project management, assessments, methodologies, data modeling, database design, metadata and systems analysis, design and development. He has worked in multiple platforms and his experience spans operational and data warehouse environments. Rehm is a technical book editor and is a co-author of the book, Impossible Data Warehouse Situations with Solutions from the Experts. In addition, he is a Certified Computing Professional (CCP), a certified Project Management Professional (PMP), holds a Bachelors of Science degree in Computer Science and a Masters Degree in Software Engineering from Carroll College. He can be reached at clay.rehm@rehmtech.com.
Adrienne Tannenbaum is president of Database Design Solutions, Inc. (www.dbdsolutions.com), a New Jersey-based consulting firm specializing in the revitalization of corporate data. The firm focuses on data issues within large organizations and supports all data reconstruction efforts with a solid meta data backbone. Tannenbaum is the author of two popular meta data-focused books: Metadata Solutions: Using Metamodels, Repositories, XML, and Enterprise Portals to Generate Information on Demand (2001, Addison Wesley) and Implementing a Corporate Repository (1994, Wiley).
For more information on related topics, visit the following channels:


