DM Review Published in DM Direct in November 2003.
Printed from DMReview.com


Data Certification for Critical Business Needs

by Deanne Larson

Summary: There are many benefits to data certification, including alignment of IT and business to produce measurable and meaningful results.

Editor’s note: If you are interested in more information on Aligning Your Enterprise, be sure to visit the online trade show of the same name at www.dataWarehouse.com/tradeshow for more insightful information and solutions.

With the current economic climate cloudier than usual, stormy relationships between IT and business organizations remain in the forecast. As organizations compete for budget and executive sponsorship, how does the business side contribute and how does IT provide ongoing value? Data certification is not only a commonly overlooked process, it is a way to clear the air and foster a truly meaningful partnership. What is data certification? – It is a process that ensures the integrity and value of data through its life cycle. IT and the business collaborate to produce measurable and meaningful results.

The quality of company data is extremely important to business stakeholders; however, the level of quality is relative to the importance of the data. How can a level of confidence be established in cases where the data is crucial? How can that level of quality be maintained? Nice-to-have data needs to have quality to ensure the data is meaningful, have-to-have data must be accurate. Financial data for Wall Street, medical information for life-saving procedures – these categories of data require a certification process to ensure the highest level of accuracy. The phases for certification include:

  • Business ownership
  • Data source identification and alignment
  • Analysis and certification
  • Monitoring
  • Change control
  • Certification as a process

Business Ownership of Data

The process of data certification begins with business ownership. The business owner knows where the data fits in the firm’s strategy. Say you own a car and you want to get the biggest return on your investment. You put gas in it, take it to the car wash, get the oil changed and rotate the tires – you preserve it as an asset. As with your car, data requires care and feeding. Data ownership is greater than providing business requirements or being a key stakeholder of the data. To be successful, data ownership needs to part of a firm’s culture and development process. The responsibilities include (but are not limited to):

  • Identifying the business need and priority for data
  • Connecting the need and priority to the firm’s objectives
  • Defining the business rules
  • Understanding how business decisions impact the data and governing those decisions (change control)
  • Establishing metrics for data acceptance

Having a data strategy that is connected with the firm’s strategy increases the overall value of the data. Business ownership should not be confused with data stewardship. Ownership is strategic, where stewardship is tactical.

The business owner should be a senior level executive whose business goals rely heavily on a particular data domain. Several data domains may belong to the same or different owners. Data stewards should be located within the business group that has the data domain ownership. This is where the partnership between IT and the business begins.

Data Source Identification and Alignment

When choosing the system to be used as a data source, a few areas of consideration are:

  • The system should be where the data is created and maintained
  • The system needs to be able to meet data delivery requirements
  • The business owner should support the system as the system of record
  • The system owner or vendor will need to be willing to support interface requirements and service levels

In the real world, the source system chosen may not fit the ideal requirements, considering an operational system’s primary use is not usually being a data source. However, to ensure a minimal amount of data quality issues, the source system should at least meet the requirements of the business owner and be willing to support interface requirements and service levels.

After the source system has been identified, an alignment needs to occur between the source system and the data warehouse. Data quality issues can stem from operational activities within the source systems. Interface rules (how data is created and delivered) and source system activities (downtime, data scrubs/mappings) should be outlined in a system interface agreement (SIA) to understand and manage data impacts. Additionally, service levels (when data is delivered, escalation procedures, how to resolve data delivery issues) should be documented in a service level agreement (SLA). Gaining commitment on system interface rules and service levels between the data warehouse and the source systems allows for easy impact identification and minimizes outages due to data quality issues.

Analysis and Certification

Once the source is identified and the strategy is clear – on to data analysis and acceptance. The analysis step is the most opportune time to discover data quality issues. Through analysis, initial data quality levels are determined. Based on the data acceptance metrics established by the business owners, a plan is put in place to raise quality levels. The plan would include cleansing and standardization as needed as well as how to address the data quality issues.

Data quality issues are addressed on two levels. The first level is based on criticality and is done prior to transformation being applied. Quality issues associated with critical data attributes to meet requirements are prioritized and addressed in order. Other quality issues deemed not critical may be postponed or the data may not be included in integration. Data stewards, who represent the business owners and are part of the business community, will validate and determine if data acceptance metrics have been reached. Once acceptance has occurred, then transformation rules are applied to the data. Transformation may occur through the ETL process or through a process to populate an analytical layer within the data warehouse. The second level of data quality issues is then identified. At this phase, transformation rules are validated or modified as necessary. The data stewards complete the final validation and determination of acceptance metrics. See Figure 1.


Figure 1: Data Quality Elements Addressed on Two Levels

Examples of how data acceptance metrics would be applied:

  • No nulls in key fields
  • No missing data
  • Receiving only expected values
  • Validation of data use cases (from source)
  • Receiving expected data distribution, trends, volume
  • Validation of transformation rules for all use cases
  • Volume of data to validate use cases is equivalent to production
  • Review of application of transformation rules to historical data (validate trends, scenarios, if applies)

Metrics would be created to establish what levels are acceptable. Having 90 percent of the expected data volume may be acceptable. A critical data element must be populated 100 percent of the time with expected values. Data use cases are allowed to have a two percent exception rate. These are examples of data acceptance metrics. Auditing routines are designed based on acceptance metrics to be run after integration in the monitoring phase.

When it is time to integrate the data into production, the data stewards continue to be heavily involved. The same data validation routines and evaluation of acceptance metrics occurs again after integration into production. After this last step, the data is certified for use.


Figure 2: Data Validation and Evaluation Metrics

Monitoring

Shouldn’t all the proactive work that has been done be enough? Not unless your business stops running! Unmanaged changes in business processes or outsourced systems are constant threats to the integrity of your data. The monitoring phase covers this. The data quality level achieved during the acceptance phase provides the standards data audits check for. Data audit processes can be customized SQL scripts or tool based. Common checks that are done include:

  • High and low data volumes
  • Missing data
  • Expected data
  • Duplicate data
  • Trending on key dimensions and measures
  • Reconciliation back to the data source

Audits are done on a frequent basis in order to find issues before they become irreversible. There is nothing worse than finding you didn’t load a days worth of data after month end close and quarter close – and after numbers have been announced to Wall Street! Audits are extremely valuable when you have multiple data sources. In the case of financial data, reconciling back to your data source on a regular basis helps identify out of synch conditions, rounding errors and data type errors.

Change Control

Monitoring is a reactive step; so what else can be done to be proactive? Managing change and impacts is another necessary part of the certification process. Any business process change can negatively impact data quality. Expected values can change. Required data may no longer be collected in the data source. A field in a source may no longer be edited. These impacts can be managed with change control and business ownership.

A part of business ownership is governing business process changes or source system changes that affect data. Key attributes, dimensions and measures should be identified as critical and categorized as a high risk. An impact assessment is completed as part of the business process or source system change. Based on the impact analysis, a change control plan is developed to address and manage the impact. The change control plan could include:

  • Reviewing test data
  • Simulating the change in a UAT environment to verify data impacts
  • Verifying impacts on historical data, trends, or data distribution
  • Revalidating data acceptance metrics
  • Data validation after implementation

In some cases, the data impact could be great. A source system rewrite or replacement would require starting the certification process over at analysis and integration.

Meta data plays a key part in the managing change to data. Data stewards usually have a role in the maintenance and upkeep of the meta data. Business rules and definitions are kept within the meta data and can be documented or defined in reference tables. In some cases, definitions for dimensions and measures that are critical to the business intelligence community are determined by reference tables. The change control process should be applied by the data stewards to these types of reference table changes in order to understand and manage the impact. Dimensions and measures in star schemas or cube structures can be impacted inadvertently if not managed properly.

Certification as a Process

The benefits from data certification are vast. Business ownership of data ensures that data is treated as an asset and its use is aligned with the firm’s strategy giving data a distinct value. This process ensures that quality levels are approved by the business owners prior to initial use – owners and stewards become intimate with data quality issues and causes. Data acceptance standards are established based on business need and priority – measuring the quality of the data and ensuring that it is meaningful. Data quality levels are identified and a plan for cleansing and standardization is executed prior to data integration into production – ensuring quality and usability at the onset. Data acceptance standards are verified after integration and are used for auditing and monitoring purposes. Frequent monitoring occurs to catch issues after integration – ensuring adherence to initial quality levels. Change control manages data impacts – provides visibility on how changes upstream and within the business can impact quality. The data integration cycle becomes shorter, providing faster time to market and improved ROI. Lastly, the data certification process is a great example where IT and the business collaborate to produce measurable and meaningful results.


Deanne Larson is the director of Data Warehouse Program Management at AT&T; Wireless Services. She has more than 12 years of data warehouse and business intelligence experience. AT&T; Wireless Services most recently was awarded TDWI’s 2003 Best Practices Award in Data Stewardship and Data Quality. Larson has presented on data quality practices at several data warehouse conferences over the last two years. She has been part of the Data Warehouse Organization at AT&T; Wireless for the past five years and has served on the Teradata Partners Conference Steering Committee for the last year.

Copyright 2006, SourceMedia and DM Review.