FREE DM Review Site Registration!
Sign-up today and access DM Review on the Web!

Your FREE registration entitles you to:

FREE email newsletters

FREE access to all DM Review content

FREE access to web seminars, resource portals, our white paper library and more!

   

Is it possible to have transactional databases replicated as a DW, provided the data is clean and similar to the DW structure? What are the pros and cons of this approach?

Question: We need a justification for having a DW? Is it possible to have transactional databases replicated as a DW, provided the data is clean and similar to the DW structure? What are the pros and cons of this approach?

Adrienne Tannenbaum's Answer: What you are describing is known as a data mart. These are often the best way to go for a few reasons:

  1. Targeted users are usually quite familiar with the data since they have been working with it in a "production mode" for quite some time.
  2. Data marts are substantially easier and less costly to develop since there is no data translation or integration involved.
  3. The information within the data mart would represent the same "time" component as the transactions themselves (daily, hourly, whatever) without the need to develop serious "snapshots."

However, data marts may not be the way to go if:

  1. Users outside of the immediate business group will be using the data as a way to supplement their own work (the sales people also want access to production data).
  2. There is a need to combine this data with data from other applications. Plenty of organizations get themselves into quite a mess by creating the "data mart spider web." In this scenario, each user group has their own data mart, developed based upon reasons mentioned above. They then realize the need to combine this data with other data, so the extract and load (no translation) begins, usually going into another place ... do this a couple of times and you are worse off than when you started.
  3. The time variables associated with the specific transactions does not equate to those of user requirements. For example, in a retail application, transactions occur every split second. It may only be feasible to capture daily summaries. If this data is needed on a more granular perspective (hourly) a "translate" function may be required. It then becomes practical to do a cost benefit to determine if the "translate" function can be more widely deployed - against other data sources - in order to widely expand the information scope of the "data mart."

There are many other pros and cons also.

Evan Levy's Answer: Is it possible? Yes, it's possible. However, it's not typically the most efficient or appropriate way of building a data warehouse.

The databases in transactional systems are typically designed and structured to support the gathering and processing of individual business transactions. The databases associated with a data warehouse are typically structured to support business (and historical) analysis.

While it is feasible to run queries against a database containing transactional structures, we often find the structures themselves can't easily support the business-oriented analysis (and related queries). It's also important to realize that OLTP system data structures aren't storage or processing efficient to support analytical processing. In fact, most OLTP systems limit the amount of history they store in order to maintain transaction processing efficiency.

The "pros" of this approach will focus on a more simplified ETL process. The amount of processing associated with moving data from source to target will be limited data cleansing and transformation.

The "cons" of this approach is increased complexity for query support and processing. This approach typically requires each query to contain a high quantity of table joins, filtering, and transformation and data conversation. We also find that user training and self-sufficiency is more complex and time-consuming when the tables reflect operational structures instead of business-oriented structures.

Sid Adelman's Answer: I think you have two questions here. Let me first address the question of justification. I assume you are talking about having to cost justify the DW. This cannot be done on a generic basis, you have to be looking at the application such as inventory or fraud analysis. Some of these applications can easily show a substantial return on investment (ROI) while for others the benefits are intangible. The easiest approach is case studies of others in your industry that include their ROI.

Now for question two - while it is possible to replicate the transactional system, you don't want to take this approach and here's why:

  • Not all the data in the transactional system is relevant,
  • The transactional system has some cryptic codes that might be difficult for the users to understand,
  • The data in the operational systems have not been cleansed sufficiently,
  • The data in the operational systems have not been integrated.

Larissa Moss' Answer: A DW is not a "carbon copy" of any operational (transactional) system. It is an integrated decision-support environment specifically designed for analytical and reporting purposes. That means that data is selected from multiple transactional systems. The data is then cleansed, deduped, integrated, enhanced and rearranged in a staging area before it is loaded into DW databases that are specifically designed to support specific analytical and reporting access requirements. In other words, databases are designed, optimized, indexed and tuned differently for a DW than they are for transactional systems.


Sid Adelman is a principal in Sid Adelman & Associates, an organization specializing in planning and implementing data warehouses, in data warehouse and BI assessments, and in establishing effective data architectures and strategies. He is a regular speaker at DW conferences. Adelman chairs the "Ask the Experts" column on www.dmreview.com. He is a frequent contributor to journals that focus on data warehousing. He co-authored Data Warehouse Project Management and is the principal author on Impossible Data Warehouse Situations with Solutions from the Experts and Data Strategy. He can be reached at (818) 783-9634 or visit his Web site at www.sidadelman.com.

Larissa Moss is founder and president of Method Focus Inc., a company specializing in improving the quality of business information systems. She has more than 20 years of IT experience with information asset management. Moss is coauthor of three books: Data Warehouse Project Management (Addison-Wesley, 2000), Impossible Data Warehouse Situations (Addison-Wesley, 2002) and Business Intelligence Roadmap: The Complete Project Lifecycle for Decision- Support Applications (Addison-Wesley, 2003). Moss can be reached at methodfocus@earthlink.net.

Adrienne Tannenbaum is president of Database Design Solutions, Inc. (www.dbdsolutions.com), a New Jersey-based consulting firm specializing in the revitalization of corporate data. The firm focuses on data issues within large organizations and supports all data reconstruction efforts with a solid meta data backbone. Tannenbaum is the author of two popular meta data-focused books: Metadata Solutions: Using Metamodels, Repositories, XML, and Enterprise Portals to Generate Information on Demand (2001, Addison Wesley) and Implementing a Corporate Repository (1994, Wiley).

Evan Levy is a partner and co-founder of Baseline Consulting Group, a multivendor systems integration and consulting firm. As the partner in charge of Baseline’s largest practice, Levy leads both executives and practitioners in delivering technology solutions that help business users make better decisions. He has led strategic technology implementations at commercial and public sector organizations and advises vendors on their product development and delivery strategies. Levy has been published in a wide array of industry magazines and has lectured on a range of technology delivery experiences at leading conferences and vendor events. He has been a featured speaker at the Marcus Evans Analytical CRM symposium, DCI’s Data Warehousing conference, the CRM Association, DAMA International, the AMA and the Data Warehousing Institute. His current work involves delivering and lecturing extensively on the topic of data integration. You can contact him at evanlevy@baseline-consulting.com.

For more information on related topics, visit the following channels:



Industry Vendors