Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search

View all Portals

Scheduled Events

White Paper Library
Research Papers

View Job Listings
Post a job


DM Review Home
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

Buyer's Guide
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

About Us
Press Releases
Advertising/Media Kit
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

Data Integration:
Stopping Your Company’s DRIPs, Part 1

online columnist Greg Mancuso and Al Moreno     Column published in DMReview.com
April 1, 2004
  By Greg Mancuso and Al Moreno

Last month we introduced the concept of DRIP; that is the common business problem of being "Data Rich and Information Poor." Unless a company has only just recently been formed and is only now developing support systems that adhere to the accepted best practices of both operational IT development and business intelligence implementation standards, most companies have grown their data environments the same way. Historically, business solutions evolved at different times in a business' IT development cycle. Each business need was analyzed, designed and modeled with its own operational system. As new needs were identified, there may have been a cursory review of the current systems, but in most cases IT developed a new standalone operational system. Couple this with the recent rash of acquisitions and mergers and IT organizations are left with silos of data that do not communicate with, or relate to, each other. Oftentimes merging or acquired companies have different hardware/software infrastructures that are incompatible. This predominance of data silos is what leads directly to DRIP, and the difficulties companies are currently experiencing using their abundant data sources to create and formulate solid, informed decisions. Beginning with this month's column we will discuss and focus on tactics to alleviate this issue.

Many of the techniques we will present are recommended for any IT initiative, regardless of its business requirements. One of the worst, and most common, mistakes made when undertaking a technology initiative is to jump straight to the tactical aspects of the project without taking business processes and requirements into account. Often, IT organizations receive a set of business requirements from their business constituency and will rapidly move to the "I know how to fix this" mode. Rather than taking a step back to reflect and review the current request in light of the total corporate IT infrastructure and stated direction. This lack of strategic planning and validation is one of the leading causes of DRIP.

Understanding the application you are trying to create and its implications is essential to overall project success. Strategic planning for a BI initiative, for example, is a bit more complex than for standard operational-type IT projects. No doubt that the topic of planning can fill volumes, so this series will focus on the more tactical DRIP-mitigation techniques. Specifically we are going to focus on:

  • Gap Analysis - The practice of reviewing business requirements, translating those into data requirements and examining source systems to determine what exists vs. what is missing and how to make up the difference.
  • Systems Audit - Reviewing and "re-analyzing" existing systems to ensure they are still functioning as they were originally deployed and assessing the underlying data to ensure the IT and business users actually understand the what meaning is contained in the data maintained by the system.
  • Master Organizational Plan - The tactical outcome of a strategic plan that reworks the current systems to ensure they fit into the long-term vision for the organization. This plan also identifies personnel and/or process changes that are required to allow the current applications to work within the planned environment. This plan also dictates time intervals for reassessment of the systems taking into account that a company's IT environment is an evolving and changing environment dictated by changes in the business.

This month we will be covering the first item, the gap analysis in detail and talking about all of the steps that go into insuring a successful outcome for any enterprise trying to create a gap analysis.

The Gap Analysis

The gap analysis involves several steps that occur throughout the implementation of any application. It begins with the all-important requirements meetings. This series of meetings requires an individual or group of individuals to gather the business information requirements from all of the users who will have a stake in the application. During the process, the business users' requests for information are evaluated and refined, and possible candidate source systems that may contain the data required to satisfy the requirements are identified. Finally, the business requirements are documented and translated into data requirements that are mapped to the candidate source data with the target data elements modeled for the new application.

Business Data Requirements - A key function of the implementation team is to work with the business users to ensure all the data requirements are met and any missing elements identified and planned for. During these meetings, the implementation team should quantify the information necessary to satisfy each of the requirements brought up in the discussions. During the requirements gathering, the following questions should be considered:

  • What are the common ad hoc requests? How long to produce them?
  • Which source systems are used for frequently requested information?
  • Is there data used in the decision making process that resides in personal repositories, such as Excel or Access? Are there any personal business rules applied to the data when it is entered or manipulated on the desktops?

As the final outcome of this step, the implementation team will have identified the data elements needed to support the user's requirements based on available or obtainable data from the current source systems. This will form the basis for the implementation's data model - either a new database or for modifications to an existing repository. In parallel to the data modeling effort, the data requirements will be the basis for completion of the source data identification process.

Source Data Identification - Once the required data has been identified and validated, it is time to identify the candidate sources for the data. It is likely that the data will reside in several source systems within the organization ranging from enterprise-wide relational databases to personal data contained in end-user Excel spreadsheets. Regardless of the source, the data must be identified and analyzed for accuracy, completeness and viability. The following questions should be kept in mind when reviewing the sources:

  • How do the source systems relate to each other? Do any feed each other?
  • Are there data elements in the source systems left unpopulated or unvalidated?
  • Do any data elements contain business rules other than those identified in the data dictionary or application documentation and is there any information implied by the element name (i.e., the first character of a free-form field identifies if the remaining characters are a business name or a company name)?
  • Is there a current published data dictionary and are lookup tables available?

The end result of this lengthy but necessary process is a solid understanding of the possible data sources, and how they relate to one another and to the data requirements. Concurrently, at the end of this process, the data model should be complete, and it will be safe to proceed to the final stage of the gap analysis: source-to-target data mapping.

Source-to-Target Data Mapping - Once the data requirements have been gathered and the data model complete, the data mapping exercise starts. This process has two major functions. First is the actual mapping exercise, but more importantly, the second is resolution of any issues that are identified during the mapping. To facilitate the mapping, create a spreadsheet with the following columns:

  • Target Table
  • Target Column
  • Data Type/Length
  • Target Data Element Description
  • Source System
  • Source Table
  • Source Column
  • Data Transformations (if required)

When mapping each of the target data elements, be sure to include all required data in the table, even if no source has been identified. Also, it is quite likely that many of the target elements will have multiple candidate sources. Be sure to record all possible sources.

Once all of the target elements have been mapped, it will be necessary to meet with the business community to present the results and to resolve any issues identified. When the issues revolve around multiple sources, the results of the data source analysis need to be used to drive the resolution since this information can help determine which source is the most valid (i.e., validated, commonly populated, source for other systems, etc.). When the conversation turns to required data elements that cannot be mapped to source systems, resolution becomes much more of a business decision. It is always possible that new personal data repositories will be identified during this conversation since business users don't always think about all of the various ways they use company data in the course of their jobs. More commonly, target elements are identified that simply have no source within the organization today. In these cases it comes down to four questions that need to be answered by the business users:

  • Can the required information be derived from current data by applying transformations or calculations or concatenations of existing elements?
  • Can the information be built from existing data systems or data entry processes that can be modified to secure this information? In many cases, a source system, or the process used to gather the input information, can be modified fairly easily and provide the required data. For example, the order entry process could be modified to require the entry of a Social Security Number, rather than allowing it to be optional and, therefore, rarely populated.
  • Can the information be secured from an outside source? This last case is the hardest decision because it is also the most costly in terms of actual money expended by the company. An example of this is SIC code information. Companies have been doing business for years and are only now getting into detailed sales analysis. Companies were only concerned with operational information about inventory and sales dollars. Now, they need to delve deeper into their client base to understand how the sales data breaks down by industry (SIC, D&B (Dun & Bradstreet), etc.) or other demographic (income, education, etc.) and firmographic (revenue, population, locations, etc.) indicators. Many types of information like this are available for purchase from third-party data providers such as credit bureaus, D&B and the SEC.
  • Can we live without the information? The final question is do we really need this information to satisfy the business questions? If the answer is no, and the users are comfortable making decisions without the requested information, then the data element is removed from the list of required data. More often, however, the data is required, and the company needs to determine the most cost effective way to secure the required information.

As you can see from the detailed labor involved, a gap analysis is an intricate and detailed step of any implementation. But this should not negate that it must be done if you want a successful implementation for your application. The gap analysis is a technique applicable to any IT implementation. It is an especially invaluable tool when designing and implementing data intensive applications such as business intelligence/business performance management and data warehouses. The project manager should plan for this exercise as part of any successful project implementation. While it may seem to add time and overhead to the project the effort and expense will ensure that the resulting implementation will be meet the end user's expectations.


For more information on related topics visit the following related portals...
Data Integration.

Greg Mancuso and Al Moreno are principals with Sinecon, a business intelligence consultancy specializing in data integration and BI/DW solution architecture design. Together they have more than 29 years of data warehouse and business intelligence experience and have implemented many large-scale solutions in both the U.S. and European markets. They may be reached at gmancuso@sinecon-llc.com or amoreno@sinecon-llc.com.

Solutions Marketplace
Provided by IndustryBrains

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Data Quality Tools, Affordable and Accurate
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Free EII Buyer's Guide
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.

cost-effective Web server security
dotDefender protects sites against DoS, SQL Injection, Cross-site Scripting, Cookie Tampering, Path Traversal and Session Hijacking. It is available for a 30-day evaluation period. It supports Apache, IIS and iPlanet Web servers and all Linux OS's.

Click here to advertise in this space

E-mail This Column E-Mail This Column
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.