FREE DM Review Site Registration!
Sign-up today and access DM Review on the Web!

Your FREE registration entitles you to:

FREE email newsletters

FREE access to all DM Review content

FREE access to web seminars, resource portals, our white paper library and more!

   

Driving Business Value with ILM-Enabled Database Archives

The explosive growth in the size and quantity of databases is a well-known phenomenon in virtually every vertical industry, making integration of database archives into enterprise-wide information lifecycle management (ILM) strategies a daunting challenge for many administrators. The increasing use of enterprise software, the Internet and Web-based commerce, as well as the proliferation of digital media mean that database growth is a fact of life regardless of industry type or use case. The phenomenon applies to structured and unstructured data in both transaction processing and data warehouse environments, and impacts the executives, managers and business analysts for whom timely access to data at an acceptable cost is increasingly difficult to guarantee.

Because of this explosive growth of data, enterprises are facing high primary storage costs and an increasingly paradoxical dilemma: while business, regulatory and compliance requirements demand more complex and increasingly rapid analysis of this growing data hoard, the access problems and costs of storage and retrieval have made it necessary to offload more of the data burden to an archive, particularly in data warehousing environments. Analysis and reporting then become functions of how fast and how accurately archival data can be retrieved and subjected to analysis. Unfortunately, the state of the art in archival storage and retrieval mandates that both accuracy and speed be sacrificed as the use of archival database alternatives grows.

Many organizations are reviewing tiered storage strategies to migrate less frequently accessed data to the lowest-cost storage devices using policy-based automated storage migration, commonly referred to as ILM. Advances in database technologies have allowed for critical database information to remain on fast-access primary storage while less frequently accessed database information is migrated to an archive on near-line storage systems. However, what is often not thought through is the impact of searching these near-line stores when data is required, either in response to an unexpected question or as part of a less frequent but nevertheless critical business cycle. One of the challenges ILM presents is providing convenient access to information in the database archive after the information has been compressed and archived.

IT staffs and BI users are just now beginning to recognize the challenges faced by this ILM database dilemma. The growth of data warehouses has begun to reach an important critical juncture: for many users, multi-terabyte data warehouses are creating a barrier to effective analysis and business intelligence, as throughput issues, data access, and hardware and administrative costs begin to challenge users and their IT managers.

Accessing the Archived Data Warehouse Across Storage Tiers

The data at the heart of these myriad business uses includes not only structured transaction data from ERP and back office systems, but also unstructured data from a host of sources that were largely nonexistent even a decade ago. Email and Internet transaction logs, voicemail databases, contracts, medical records, point-of-sale systems data and other data sources have been added to the ocean of data that companies must now swim through in the course of their day-to-day operations.

These transaction systems leave a data trail that is piling up at an astonishing rate: it's not uncommon for active transaction systems to contain many terabytes of data. The data warehouses that are fed by these voracious transaction systems are becoming larger than anyone had ever imagined.

Deriving Business Value from the Data Archive

Data archives have been a traditional solution for addressing usability and cost, particularly when it comes to off-loading historical or infrequently used data. Indeed, the archive's main contribution has been to improve the usability of the remaining online data. As such, archiving has traditionally been a less-than-perfect solution to the problems of too much data and not enough throughput because most archiving solutions rely on tape-based systems that are both costly and not user-friendly. The result is that while archiving solves the problem of throughput and cost for the on-line portion of the data, it fails to provide a solution for archived data that is cost-effective and supports relatively rapid data access.

Thus, from a business standpoint, archiving is a problematic solution for most users. Archives cannot support timely data analysis, despite the fact that for many business uses - particularly those relating to regulations, compliance and legal action - timeliness is a major criterion for action. The current state of the art in archiving is thus too cumbersome and costly to keep pace with the growth of transaction databases and data warehouses and the analytical needs incumbent upon them. For most companies and most use cases, archiving represents an imperfect solution.

A New Approach to Tiered Data Archiving

Clearly, a new archiving solution is required to enable databases to operate at maximum efficiency. One potential solution provides four key features:

  • Data compression,
  • Online query access,
  • Maintenance exposure, and
  • Enterprise scalability.

Use of column-based data compression technology allows for storage of relational data in what is essentially a pre-indexed format, alleviating the requirement for storing or building indexes at restore time. This design significantly reduces the overall storage needed for the database. Column-based storage also significantly improves data compression: being made up of a single data type, each column of data can be compressed much more efficiently than rows of data, which by definition include many different data types. This technology can also further reducing the data footprint by selecting the best optimized compression strategy for each data type.

Column-based storage also allows more rapid processing of archival queries: reporting tools can either directly query the repository using the subset of the ANSI SQL language current supported, or the necessary data can be rapidly restored to an operational data store and queried using the full complement of SQL commands. This accessibility contrasts with the majority of archiving systems that limit access to summary data unless a full database restoration process has been undertaken.

This particular architecture also supports the efficient management of database schema changes, processing data on a column basis instead of through a complex set of scripts that normalize data to meet the needs of a new schema. Because queries are processed on a column basis and most data schema changes are at the column level, padding or reformatting can take place in real time as the query is processed.

The result of the compression and schema management capabilities is that the archive can be maintained and used more like an online database than an off-line archive, whether accessed from primary storage or secondary storage subsystems. This architecture allows very large database environments to be archived online instead of on tape, providing a significant savings in storage maintenance costs and a significant improvement in throughput.

From a functional standpoint, a database archive solution that scales across platforms operates more like an online data warehouse than an archival database. The ability to use the same query tools, and to do so using online or near-line disk storage instead of mounting tapes, means that the user experience - and therefore the quality of the overall analysis - can be vastly superior to the standard archiving systems in use today.

Database Archives that Advance Storage Management

Although data warehouses represent by far the fastest growth and single greatest opportunity for next-generation online archiving solutions, the growth in transaction database size to multi-terabyte status means that the same issues that are driving data warehouse customers toward a near-line archive-type solution will drive transaction database users in a similar direction. Many vendors have ambitious plans for meeting the needs of transaction database customers that will fit this growing requirement nicely.

There are vendor solutions ideal for next-generation data management, such as ILM, that also provide increased analytical capabilities for rapidly growing data, particularly when the twin requirements of high throughput and cost-effectiveness are brought into play. The emergence of new technologies foreshadows the end of the data archive as we know it. In its place is something that is functionally richer, runs analytics more quickly and does so at a significantly lower cost by leveraging new storage architectures. Perhaps most importantly, advanced online archiving solutions for can deliver all of this functionality without requiring a revolution in data warehouse design, deployment or use.


Jerome Shattner is a founder of SAND Technology, an international provider of intelligent enterprise information management solutions. Shattner has educated C-level executives on the benefits of leveraging corporate databases to support the bottom line and has presented at business and industry events over the past twenty-five years. From 1987 to 1999, he served as president of SAND Technology's joint venture company with Hitachi Ltd., Hitachi Data Systems Inc. Under his direction, Hitachi Data Systems achieved more than $100 million in annual sales. Prior to SAND's inception, Shattner was employed by IBM in a variety of management positions. He may be reached at jerry.shattner@sand.com.

For more information on related topics, visit the following channels:



Industry Vendors