Data Warehousing Lessons Learned:
Managing Meta Data Risks
Collecting, administering and leveraging meta data presents challenges and risks that must be surfaced and managed to avoid unpleasant surprises in the areas of data warehousing, data administration and the system development life cycle at large. Absent careful planning, the surprises can overwhelm the benefits of any meta data initiative. Without accurate, current, high-quality meta data, development teams in both data warehousing and transactional systems are on a slope of diminishing returns, working harder and harder to maintain many-to-many interfaces. Meta data affects system analysis, version and change control, system interoperability, intersystem visibility and transparency, and related factors at an enterprise level.
The number one risk to meta data projects is that the team will end up with documentation, not actionable insight into diverse IT systems interoperations. Of course, system documentation is generally a useful and, at times, an essential IT artifact. However, it is subject to a number of well-known shortcomings. When produced in the form of electronic documents, it is an idle wheel. Documentation does not automate or move any part of the development or maintenance process. Documents are often obsolete the very day they are published. A document does not know whether it is obsolete or not; and a labor-intensive, manual effort is required to keep documents current. Inaccurate (outdated) information is often worse than no information at all because it is misleading. In contrast, a data modeling tool from which database data definition language (DDL) can be produced or which can be imported into an ETL (extract, transform and load) or query-and-reporting tool is a mechanism that enables meta data-driven design or maintenance. Because the meta data interchange between tools is incomplete, some manual labor will be required to manage the risk of a meta data idle wheel. Teamwork and discipline remain essential to managing this situation.
The next most common risk to meta data is a function of slippery semantics - ambiguity in data definitions and lack of transparency in the meanings of design terms. The risk is that impact analysis will be prevented or rendered inaccurate. For example, a comprehensive data dictionary is an important subset of meta data. When the data elements are arrayed in a data models that, in turn, are mapped to application processes or code, the configuration can be a powerful and productivity-enhancing enabler of impact analysis. The risk arises as data is collected from multiple systems, including legacy system, enterprise resource planning (ERP) systems and packages ("black boxes") of diverse kinds. The entities of "customer," "product," "supplier," etc. can be implemented in a variety of ways. Even assuming the meaning of the data element is the same, slippery semantics can occur: customer_abc and customer_123 will not match as the automated process of impact analysis occurs. A process of rationalization is required. Someone (usually a data administrator) must undertake an analysis and establish a connection such as an alias, synonym or other semantic marker within the meta data tool. Absent such a commitment (and cost!) to rationalizing what is captured, meta data administration will risk missing many of the benefits of impact analysis and the implied productivity improvements. In the worst case, the developers completely overlook the coupling between systems and will be surprised when processes do not function as designed. The most likely outcome is that extra expenses will be incurred as the search for interconnections is performed by visual inspection and program archaeology. If the resulting links are not captured to a central repository or small set of federated repositories, then this work will have to be performed again and again as the system requires maintenance. Rationalization is an important follow-on task after collecting meta data about system internals. It is also a significant cost, given that the manual effort scales at least linearly with the number of data elements and complexity of interfaces - and indeed is a significant meta data cost driver. The one thing sure to be more costly than undertaking rationalization is neglecting it.
Mistaking a meta data tool for a solution is a risk to which the serial tool buyer is exposed. Yes, technologies and products are required, but they tend to be most useful within a well-defined framework for meta data management that harnesses technology as an enabler and productivity enhancer rather than as the goal itself. Meta data addresses a special case of the challenge of modern commercial computing at large - to get disparate, heterogeneous, mutually incompatible systems to work together in the service of the business user. This should temper our expectations for a quick fix or single-point solution, including a tool; but it should also encourage our efforts because grand challenges tend to call forth significant resources and commitments. Like any really tough problem, the solution will require both step- by-step progress, as well as breakthroughs that redefine the limits of the possible. In the meantime, that means that teamwork and team discipline - the IT staff follows a well-defined and agreed process in maintaining meta data - will remain on the critical path to reducing the risk of surprises due to obsolete meta data.
For more information on related topics visit the following related portals...
Lou Agosta is the lead industry analyst at Forrester Research, Inc. in data warehousing, data quality and predictive analytics (data mining), and the author of The Essential Guide to Data Warehousing (Prentice Hall PTR, 2000). Please send comments or questions to email@example.com.
Provided by IndustryBrains
|Bowne Global Solutions: Language Services|
World's largest language services firm offers translation/localization, interpretation, and tech writing. With offices in 24 countries and more than 2,000 staff, we go beyond words with an in depth understanding of your business and target markets
|Award-Winning Database Administration Tools|
Embarcadero Technologies Offers a Full Suite of Powerful Software Tools for Designing, Optimizing, Securing, Migrating, and Managing Enterprise Databases. Come See Why 97 of the Fortune 100 Depend on Embarcadero!
|Online Backup and Recovery for Business Servers|
Fully managed online backup and recovery service for business servers. Backs up data to a secure offsite facility, making it immediately available for recovery 24x7x365. 30-day trial.
|NEW Glasshouse White Paper from ADIC|
Learn to integrate disk into your backup system; evaluate real benefits and costs of different disk backup approaches; choose between disk arrays and virtual tape libraries; and build long-term disaster recovery protection into a disk backup system.
|Data Mining: Strategy, Methods & Practice|
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.
|Click here to advertise in this space|