Meta Data Management in the Enterprise
The modern day enterprise is immersed with data. There is data everywhere and, in most cases, the same data exists in multiple locations. An enterprise should be able to identify the source, lineage, semantics and access paths to the data. Meta data, or what is commonly called "data about data," is the key to this information. Surprisingly, most enterprises do not seem to have a coherent strategy toward meta data. Different parts of the organization acquire disparate toolsets to maintain their data. Each toolset has its own meta data. Consequently what we see in a typical enterprise is what we call "meta data islands," pockets of information that cannot be interlinked with each other. To address this problem, some organizations tend to start huge meta data integration projects wherein significant amount of money and time are spent trying to integrate these pockets of meta data. However most of these projects do not seem to follow a structured approach resulting in lower ROI.
This article attempts to address an approach to take toward meta data management, including the kind of meta data to collect, various options to model the meta data, architecting the right solution and ensuring ease of maintenance in the long run. Most of these approaches exist in one form or the other in various organizations. This article attempts to collect and collate the various best practices based on our experiences.
Meta Data Classification
At a very high level, meta data can be classified into two categories.
- Shared meta data
- Unique meta data
Shared meta data elements need to have consistent definition and semantics across the enterprise. For example definition of a Customer entity should be homogenous throughout the enterprise.
Another way to classify meta data is
- Business meta data
- Technical meta data
- Process meta data
Business meta data comprises of definition of entities that is relevant to business users, logical data maps, data warehouse dictionaries. Technical meta data comprisesdata about physical objects. This includes the physical table and column names, constraints and the physical transformation rules between the various zones. Finally the process meta data consists of process related statistics like load statistics, scheduling information and exception related processing.
Defining a Meta Data Solution
For any enterprise to have a successful meta data solution, we propose adherence to the following steps.
- Capture meta data requirements.
- Choose an appropriate meta-model.
- Define a high level architecture.
- Implement and maintain the solution.
Capturing Meta Data Requirements
Identifying meta data requirements unfortunately can be a daunting task. The target stakeholders for meta data are widespread and diverse. They can range from end users to analysts to applications and to toolsets. Standard requirements gathering process may not hold forth. To address this unique nature of meta data, the following approached is advised.
- Identify stakeholders for each meta data element.
- Classify each meta data element into business, technical or process meta data.
- Based on the usage across processes, classify each meta data element needs to be classified into shared or unique.
The next step is to identify the source of the meta data element. This is usually defined as "official meta data" or "meta data of record."1 meta data of record indicates the official version of a particular element in the event there are multiple sources for the same. To label a meta data element as official, it is important to understand the various processes that could create this element. This information helps to decide the official source for meta data. For example if a retail enterprise is building a enterprise data warehouse and a customer entity is created at multiple locations such as customer data warehouse, CRM system and a merchandizing system, it is important to analyze the validity and completeness of each source and to estimate which of these definitions could serve as the official version. In this case there could already be a customer data warehouse (CDW) defining the dimension and it may be efficient to decide that the data dictionary of the CDW is the official meta data of record. Once this exercise is done for all meta data elements it is safe to say that the organization of meta data requirements is complete.
Choosing a Meta-Model
Once the meta data requirements are formalized, the next step is to develop a model. It is important to model meta data as it can become an important element for reuse across the enterprise. There are several ways of capturing the meta data model.
- Build a custom data model to capture meta data.
- Leverage existing standard models.
- Leverage meta data repository available with tools as an integration source.
To build a custom meta data model, it is important to capture the correct definition of entities, its attributes and relationships to other entities. The model can be developed as an OO model or an E-R model. If going in for a standard model then there are a couple of them available. The Open Information Model (OIM) and the Common Warehouse Meta-Model (CWM). CWM is a specification that describes meta data interchange among data warehousing, business intelligence, knowledge management and portal technologies.According to the Meta Data Coalition, the OIM is a set of meta data specifications to facilitate sharing and reuse in the application development and data warehousing domains. OIM is described in UML (Unified Modeling Language) and is organized in easy-to-use and easy- to-extend subject areas. The data model is based on industry standards such as UML, XML, and SQL.
Making the choice of the appropriate meta-model can be challenging. While the custom meta-models can offer a lot of flexibility, creating a robust model at the enterprise level and maintaining it in the long term can be quite cumbersome. Without a well thought out plan this can be a failure. On the other hand, standard models are quite extensive and cover most of the requirements at the enterprise level. Customizing to the specific needs of the enterprise can be challenging, however. For an enterprise where there is a barrage of toolsets with associated meta data present, it might be worthwhile to explore the meta-models that one of the vendors provides. Of course, the integration efforts could be significant. On the other hand, if an enterprise is embarking upon a meta data initiative and there are no disparate tools sets then a custom meta-model should be beneficial.
Once the modeling of the meta data is complete, it is important to define the repository that will store the data. The repository can be a relational store or an object store.
Defining a High-Level Architecture
There are various possible architectures to implement a meta data solution. One of the solutions is to have a centralized meta data repository where all of the meta data resides. A typical architecture is shown in Figure 1.
The meta data is stored in a central repository. Typical elements that would be stored include application meta data, DBMS meta data, business meta data and process meta data. Creation and modification of the meta data entries need to be done through a common interface. The meta-model for this solution can be either a custom developed one or the standard models available. Some of the advantages of this architecture are:
- Maintaining meta data is relatively simple.
- Interoperability between components is facilitated.
- Reporting can be made simpler.
Some enterprises have attempted to create meta data solutions on a very small scale. This means different parts of the organization have created their own meta data solution. There are possibly two architectures that can address this scenario. The first one is known as "meta data exchange." An oft recommended solution2 is depicted in Figure 2.
In this particular example, meta data from various sources that already exist are extracted and transformed into a common model which resides in a common repository. To facilitate the interchange of meta data, XML is used as the base for transmission. Each application, DBMS or tool communicates with the repository using XML. A parser at the repository end strips the XML format into the meta-model format and updates the repository.
A third and final architectural solution is known as distributed architecture. In this case, it is assumed that the enterprise has spent considerable amount of resources creating a local meta data solution, and an enterprise- wide integration effort is found to be too expensive. As a result a localized meta data solution continues to exist and, where appropriate and feasible, meta data is shared across sources.
Implementing and Maintaining a Meta Data Solution
Once the architecture is finalized and meta-models are decided, it is time to implement the solution. Some of the important considerations are:
- Nature of the meta data repository (relational database vs. file system vs. object database or XML repository);
- Security considerations for the meta data repository (Who administers the repository? Who can read/write into the repository?);
- Mechanism to create, read and write the meta data components;
- Reporting infrastructure for meta data.
Once the plan is laid out and appropriate hardware is procured, the meta data solution can be implemented.
Implementing the meta data solution does not solve the problems. It is important to ensure the longevity of the meta data solution and maintain it appropriately. As a primary requirement it is important to assign roles and responsibilities within the enterprise.
A sample responsibility list3 recommended by Adrienne Tannenbaum in her meta data solutions book is listed in Figure 3:
Once the roles and responsibilities are defined, a process needs to be created to define the life cycle of the meta data. The life cycle determines who creates the meta data, who are the users of the meta data components and who is responsible for maintaining the meta data components. One of the main success criteria of an existing meta data solution in the long run is the expandability of the solution. The architecture should be able to absorb any new meta data requirements with ease. To ensure that this can be achieved, a process needs to be in place regarding adding new meta data information. Some important questions that need to be answered are.
- Does the new meta data need to be stored in the common repository , if available?
- What are the access methods for this meta data element.(read vs.read/write)?
- Is this meta data unique or is it going to be shared across applications?
Based on the answers to these questions appropriate decisions can be taken to host the new meta data component.
A Sample Meta Data Solution
Consider the case of a retailer Acme Inc. who has a collection of data warehouses to support various business reporting. They have a data warehouse for supply chain reports, a data warehouse for CRM , one for sales, one for financials. Acme is now embarking upon an initiative where an enterprise-wide consolidation will create an enterprise data warehouse (EDW). The EDW will be the central storage for all enterprise-wide data, with specific business units creating data marts from the EDW. As a part of this initiative it was realized that there needs to be a meta data consolidation strategy as well.
We can now apply the four steps recommended earlier, to aspect of the project and arrive at a solution for the meta data strategy. The first step is to identify the meta data requirements. This process consists of identifying the stakeholders and classifying the meta data. Since this is a warehouse consolidation effort the types of meta data are fairly simple. We have enterprise dimensions that need to be defined, enterprise facts. Both of these have shared business meta data. The next set of meta data is the list of tables and columns that implement the dimensions and facts. This falls under the category of technical meta data. Finally to document the ETL process and data mart creation process we need information about the steps involved. This falls under the process meta data.
Some of the stakeholders for this meta data are the data modelers, ETL developers, ETL tool, data mart developers, report developers and reporting tool. This high-level requirement needs to include all the meta data elements, their classification and assignment to the appropriate stakeholder.
The next step is to model the meta data solution. The decision was taken to develop a custom meta model to capture the meta data. The meta model addresses the requirements for the data model, the ETL process, the data mart and the reporting tool.
Once the meta model is finalized, the next step is to define the high-level architecture. The architecture for Acme was to maintain a single repository for the meta data, and a process was defined to populate the repository from all systems. For example, once the dimensions and facts are defined, the meta data from the data modeling tool is exported and saved in the custom meta data repository. The ETL process information is created manually and saved in the repository. The reporting tool's repository is populated using predefined techniques. To address reporting requirements for the meta data, a Web-based reporting system was developed that would query the underlying repository to delivery information.
With this solution in place, the meta data consolidation can be considered to be almost done. The next set of challenges is to maintain this solution in the long run. For example, how do we handle a new entity or dimension being created in the data model? How do we log information about a new ETL process or a new report? The answer is the meta data maintenance process. For data models, there is a periodic synchronization process between the tool's repository and the meta data repository. There are similar processes for ETL and reporting as well.
In conclusion, the importance of meta data in the enterprise is well known. A proper strategic initiative can go a long way in solving the "data web." It is also important to realize that meta data is not the silver bullet for data management. It is a powerful asset that can improve the quality of data analysis in the enterprise, thereby making the enterprise productive. Rather than trying to find a perfect solution it is important to create the appropriate solution.
- Tannenbaum, Adrienne. Meta Data Solutions. Addison Wesley, 2002.
- Kimball, Ralph and Caserta. Joe. The Data Warehouse ETL Toolkit. Wiley Press, 2004.
- Source: Meta Data Solution, Adrienne Tannenbaum.
For more information on related topics visit the following related portals...
Muralidhar Prabhakaran is a senior technical architect with the Retail, CPG and Distribution practice of Infosys. He has several years of experience in executing complex data management projects. He can be contacted at firstname.lastname@example.org.
Provided by IndustryBrains
|Design Databases with ER/Studio: Free Trial|
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.
|Free EII Buyer's Guide|
Understand EII - Trends. Tech. Apps. Calculate ROI. Download Now.
|Data Quality Tools, Affordable and Accurate|
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.
|Data Mining: Levels I, II & III|
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.
|Click here to advertise in this space|