-
Marketplace
-
Channel Resources
Articles from this Site
Teradata Launches Data Warehouse Packages
Navy Exchange Service Command Selects Netezza
Netezza Enters Location Intelligence Market
St.George Bank Upgrades Enterprise Data Warehousing with Teradata
Kalido Strengthens New Ministry Information Library
White Papers
Spend Data Warehouse on Steroids
An Architected Approach to Integrated Information
KALIDO Business Modeling
Data Warehouse Lifecycle Management
Data Warehousing Ensuring Data Integrity
Books
The Enterprise Data Warehouse: Planning, Building, and Implementation
Enterprise One to One: Tools for Competing in the Interactive Age
Data Warehousing Advice for Managers
The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing, and Deploying Data Warehouses
90 Days to the Data Mart
Implementing Common Reference Data in a Data Warehouse
As businesses mature day by day, there is less margin for inefficiencies in the business operation. As the complexity in running a business grows, people are looking for ways to simplify and standardize process, implement corporate governance and extract more value out of information at hand. One of the common issues encountered across organizations is standardization of reference data.
Reference data keeps changing as the business evolves. Common reference data (CRD) could be referred to as a subset of the universe of master data management (MDM) strategy. Think about it this way. The computer understands only bits and bytes, irrespective of the fact that the program is running in C, C++, Java or Visual Basic. Think about how complicated computer systems would be designed if there were no concurrency each language would require a different set of computers.
Another example would be different languages spoken all over the world. Here we do not have a common point of reference. We are aware of the difficulties faced when you are visiting France and have no knowledge of French. Think how life would be easier if we had a common language.
Common Frame of Reference
![]() |
| Figure 1 |
The CRD central function is to standardize and centralize the governance, sourcing, distribution and maintenance of reference data that is used by more than one business units. This article provides a framework for implementing the reference data, and it also provides the challenges faced across enterprises. As the number of applications grows in the enterprise, people develop their own standards and set of reference data that makes it cumbersome to integrate while building up an enterprise data warehouse or while doing a cross-sell/up-sell report.
There could be many reasons behind implementing a CRD system; a handful of them are mentioned below. You may want:
- Analytic abilities to track the change in business.
- Manual load of a set of reference data into warehouse, leading to inefficiency in the process.
- An approval mechanism for a new set of reference data by the governance team.
- To publish a common set of reference data to downstream systems and processes.
- Security requirements on a common set of reference data - ease of maintainability; easy sharing of information across the organization; easy access by everyone.
- To manage and capture CRD versions and errors (errors due to incorrect requirements or errors in application).
- Easy accessibility and usability by users (IT and business).
- Standard communication protocol on CRD distribution.
The common reference data can be divided into four different types.
- Integrity-based, e.g., Country Code, Currency Code. This maintains the integrity of the data in the warehouse.
- Enrichment, e.g., the types of codes derived using multiple data items and business rules.
- Standardization, e.g., product codes.
- Application-specific, e.g., common set of reference values used by different downstream processes.
Framework
The above framework provides a simplified version of implementing a process for managing, accessing and sharing common reference data sets. The different steps involved in the process are:
- The authorized person approves or adds the required common set of reference values into the relational data store (RDS) using a front-end GUI application.
- Once the reference data is added/changed/approved, it is ready to be used by the ETL process while loading the data into the warehouse.
- RDS will keep the different version or history of the reference data.
- The same can be used in the reporting layer for audit and analysis purposes.
Some of the benefits of implementing CRD process are:
- Security - more security on accessibility of reference data.
- Accountability - defined process of review and approval.
- Auditability - CRD changes are more auditable.
- Reusability - other lines of business can reuse the application.
- Versioning different versions of CRD data can be managed.
- Easy CRD impact and analysis can be done by IT and business users through analytic tools.
- Complex manual effort of loading and maintaining CRD data can be avoided.
Communication of new values plays an important role, as in most places there is no integrated system to maintain the reference. As an example, when a new product is launched in the market, it does not effectively get communicated to all the stakeholders. In most enterprises, the developer manually maintains the reference data. The developer enters the new codes when he realizes that there is something wrong with the data or before the start of the process. This way, there is no definitive process of communication, which leads to a lot of time spent on analyzing data, rerunning the process.
![]() |
Shyam Mahapatra has seven-plus years of industry experience in building warehouses across banking, retail, insurance, pharmaceutical domains. He can be reached at shyam.mahapatra@yahoo.com.û
For more information on related topics, visit the following channels:




