-
Marketplace
-
Channel Resources
Articles from this Site
What is the Best Organizational Chart for Performance Management?
Statistical Learning for BI, Part 1
NorDx Uses InforSense
DataMentors Selected by BAI
Meijer Selects QuantiSense for Retail Business Intelligence
White Papers
HP ERP Business Intelligence
Business Intelligence for Tax Planning: Value, Strategy, and Vision
Single Sign-On for Webintelligence
A Structured Method for Specifying Business Intelligence Reporting Systems
Business Intelligence in a Real-Time World
Web Seminars
Looking for speed and accuracy in your financial planning and budgeting?
Hyperion Visual Explorer: Improve Visibility into Performance Management
Reducing the Cost of Deploying and Managing Data
Combining Microsoft Business Intelligence with the Teradata Warehouse
Espresso Shot Web Seminar: Uncorking the Data Bottleneck with Operational BI
Books
Classifying Structured Data When Data is Created Simplifies E-Discovery
With the recent provisions in the Federal Rules for Civil Procedure, many companies are now in reaction mode investigating means for compliance. Several IT professionals have recently received mandates from superiors to implement plans immediately to ensure compliance. These new provisions in the federal legislation, as well as recent changes in The Sedona Guidelines on the Management of Electronic Information, have created many challenges for IT departments that manage electronic information.
For structured data, the problem is even more challenging when considering current database management practices. For every production database application, many IT organizations create multiple copies of the database for production support. These copies are used for test, quality assurance (QA), standby, training and new application development. In many cases, the copies are created in environments that do not have the same security controls as the production environment. If the production copy contains sensitive information, so do all of the copies. This poses a greater risk of insider theft or tampering of sensitive information.
During an e-discovery process, data is searched, classified and presented as evidence in a legal case. A large component of presenting electronic data as evidence is proving that the information is authentic and that a company has placed the proper controls around the data, protecting it from insider theft or alteration. Many application and database vendors provide features that allow IT departments to implement controls to prevent fraudulent activities, but if the features are not deployed properly in the production copies, the risk of theft or tampering still exists.
Examples of controls available in database applications include encryption, digital certification, read-only mode and auditing features. Many of these controls, if deployed improperly, may have adverse effects on application performance. The controls may also increase the cost of the application if the features incur additional license fees. To mitigate performance implications, IT departments may upgrade application servers by increasing the number of CPUs, also driving the total cost of ownership higher. When evaluating the type of information stored in these database applications, implementing these controls on all data in the database may not be necessary. Deploying data classification policies on the database data addresses many of these issues.
Data classification for structured data requires a deep understanding of the database schema - the table structures and the interrelationships between the tables - and the application logic - mapping business policies to how the data is stored and manipulated. Tools to assist in the data classification for structured data require three components: object definition, criteria and policy. Each component is described in more detail below.
The object definition encapsulates which tables of an entire database application represent an encapsulated business object. For example, a general ledger (GL)transaction does not include every single table in a financial application. Rather, it includes an organization identifier, balances, journals and a booking period (i.e., month-year). The object definition includes all of the tables in the database that contain these four components. Another example of an object definition for a patient record in an electronic patient record database includes the patient's personal information, symptoms, diagnosis, prescriptions and physician notes. When database data is classified, the object definition translates into the SELECT statement in a structured query language (SQL) query.
A criterion defines how the data is classified within the tables of an object definition. Continuing the example of the GL object definition, criteria are defined as those transactions in the GL tables where the booking period is closed. When businesses close their books, there is a process where all transactions are reconciled. When the closing process is complete, the transactions in the GL tables should be placed in a read-only mode, and controls should be in place to prevent modification. The criteria for identifying closed GL transactions are defined as those transactions for a particular booking period where the booking period is closed and the status of the booking period is stored as a value or a combination of values in a database table. The criteria translates into the WHERE clause in a SQL query, SELECT all data from the tables in the object definition WHERE the booking period is closed.
The policy involves mapping the business context with the object definition and criteria. Examples of policies are a data retention policy - how long is this data required to be online or available; a security or audit policy - who should have access to what data and how is the access event tracked and audited; an availability or disaster recovery policy - how available should this data be in the case of a disaster or equipment failure? Policies, also called service level agreements (SLAs), vary across industries, corporations, departments and types of data. It is important to know what policies are required due to government regulations and what policies are required for corporate best practices.
Continuing with the GL example, an Internal Revenue Service (IRS) data retention policy for GL transactions in the United States is seven years. All GL data needs to be kept available for seven years. In Germany, the GDPdU requires financial data to be stored for ten years. Another example of a policy for patient records is all patient record information needs to be stored in an environment where the access is controlled under the HIPAA and the information is retained and available for the life of the patient. When database data is classified, the policy is translated into parameter values in a SQL query that place the data in buckets in context to the business policy to be executed. The business policy may change over time as new laws and regulations are created or updated. It is important that the data classification tool to be used provides an easy way to apply changes of business policies to previously classified data. For example, if the data retention of GL data increases in the U.S. from seven years to ten years, the data retention controls at the storage tier need to be updated to reflect the change to the data the retention period.
The storage of the object definition, criteria and policy in data classification tools varies by tool vendor. It is important to make sure the tool provides a simple way to update the object model. As database application vendors release patches or newer versions, it is possible that the object definition may change. For example, tables or columns could be added that may impact the object definition. Application logic could be altered such that the previously defined criterion is no longer valid.
When sensitive information is stored in a production database as well as all copies, deploying a combination of archiving and data retention policies with data security policies can be deployed effectively to reduce risk associated with insider threat or tampering while maintaining compliance. In the case of financial data, when a business process closes a transaction, such as a GL booking period or a purchase order, this information is considered a read-only transaction. The information may still be reported on, but based on the definition of the closed business process, the transaction itself should not be updateable. By archiving the read-only data from the production database to an online, active archive where the application is still able to access the archived data, the data can be removed and separated from the production database. The data is also removed from all copies, consolidating the read-only information to a central controllable archive repository. Controls enforcing the read-only status can be applied to the separate archive database. The read-only data can be stored in the active archive until the data is no longer required. In the case of GL data, once the booking period and the financial year are closed, this data can be relocated to an online archive for up to seven years. After seven years, the GL data no longer needs to be retained. Because the older GL data only resides in a single central repository, the data destruction process is simplified.
Deploying this type of information lifecycle management policy has many additional benefits. By removing data that doesn't change from a production database and all associated copies, the total storage and tape backup volumes are reduced because information is consolidated to a central repository. Because the production database is smaller, backup and recovery windows are smaller. Application performance can be improved because now the production database contains only the required data, reducing table sizes, improving query response times. Audit and security controls can be enforced on the complete archive database lowering risk of tampering.
For sensitive information that resides in the production database copies, a separate data security policy can be deployed to protect the sensitive information in the copies. For example, if a person's Social Security Number (SSN) is stored in a test copy, a data masking or scrambling policy can be executed across all instances of SSN in the database copy protecting the individual's personal information. Another use case for a data security policy involves human resources and payroll applications. Audit controls are common on tables that store a person's pay grade and commission rates. These examples of data security policies further reduce risk associated with data theft and tampering.
When looking at data classification solutions for e-discovery, the following features need to be considered:
- Leverages a central metadata repository. Metadata is information about the data object that can be used to define criteria and data management policies. Look for a solution that leverages a single metadata repository for storing the object definition, the criteria and the policy. The repository can be centralized physically or virtually in a federated model. Avoid point solutions. The important feature is that the repository can scale across the enterprise.
- Supports all data types. The metadata repository should support all types of data, including files, documents, email, instant messaging and database data. It is important to have the ability to define a single metaphor for how data is to be classified. For example, financial data can exist in spreadsheets, databases and emails. The ability to define a single seven-year data retention policy for all financial data simplifies the data classification process significantly.
- Automates the discovery process and policy enforcement. Data classification tools add little value if the entire process requires manual intervention and data entry. Automation reduces legal costs associated with e-discovery processes as well as improves accuracy of the Discovery search.
- Adaptable. Most importantly, the data classification solution needs to be easy to use and simple to update the object model, the criteria and the policy definition. Laws and regulations are constantly being introduced or altered, adapting to newer ways of conducting business. If the solution is difficult to change, costs associated with maintaining or upgrading the technology can become prohibitive.
Corporations embarking on enterprise-wide initiatives to classify electronic data are continually challenged with finding a process and technology solution that meets all their needs. These needs and requirements are driven by federal regulations and putting measures in place to avoid costs associated with e-discovery, penalties and fines. By being proactive and deploying data classification processes upstream when data is created reduces cost associated with e-discovery processes downstream.
Julie Lockner is vice president of sales operations for Solix Technologies of Sunnyvale, CA.
For more information on related topics, visit the following channels:


