![]() |
Sign-Up for Free Exclusive Services: | Portals | | | eNewsletters | | | Web Seminars | | | dataWarehouse.com | | | DM Review Magazine |
![]() |
![]() |
||||
![]() |
![]() |
|||
Information Is Your Business | Advanced Search | |||
|
|
![]() |
Data Warehousing Lessons Learned:
|
![]() | |||||
![]() | |||||
| |||||
![]() | |||||
![]() |
As the number of data warehouses at least one terabyte in size has grown from 21 percent in 2002 to 43 percent of enterprises in 2004, many have asked Forrester: Is it feasible to back up the database?1 Complaints have included: 1) the backup won't fit in our batch window, 2) we don't have a batch window, and 3) the mean time between failures (MTBF) of modern storage technology is better than ever - do we need a backup at all? The answer is unless the system is of no consequence to the business (in which case, why is it in operation?), a periodic backup must be made based on the time horizon in which the business process functions.
Getting started with a data warehousing backup strategy means managing how:
Partitioning to increase operational flexibility. Partition the database in such a way that an individual partition can be backed up in a reasonable window of time, even if the entire data structure is too large to backup all at once. For example, 12 partitions of 250GB each could be backed up periodically (daily, weekly or monthly, depending on the requirements). The schedule handles one partition at a time, not all three terabytes at once.
Incremental backup to reduce elapsed time to complete an operation. Use a database whose backup and recovery utility allows for incremental backup and restore. Both Oracle and DB2 provide for incremental backup. Because the backup copy just contains the deltas, the recovery operation requires less time than if the entire data structure has to be restored. However, a complete copy of the data to which the incremental copies can be applied must first be made.
Parallel processing to divide and conquer large data volumes. The operations environment can increase the bandwidth of the backup or restore activity by increasing the number of tape devices used and executing the operation in parallel. For example, if the operation calls for 24 data structures to be backed up and each requires one hour to complete, and 24 tape devices are available, then an exclusive backup window of only one hour is needed. In contrast, if only one tape drive is available, then the entire process will require 24 hours.
In this example, the tradeoff is between investing up front in the additional tape drive infrastructure and the inefficiency of the process on a daily basis, day after day. Such a parallel architecture also applies to massively parallel databases, such as Teradata or the IBM Scalable Parallel framework, where each node can be mapped to a tape drive.
Concurrent backup to extend availability. The major database vendors now allow concurrent backup - the table or table space can be backed up while the data structure is still online and in use. Log changes must also be applied if there are concurrent updates, but that is not the case for an inquiry-only data warehouse. This concurrent backup greatly relieves the problem of having to fit the backup into an exclusive batch window.
Of course, in the case of a data warehouse that is inquiry (i.e., read) only, the problem is much less severe. Once an initial backup is taken, the database does not change. The data management function must still provide for a backup of the additions to the tables, but these are usually much less than a full backup and can be handled by incremental backup of just the deltas (what has changed).
Redundant arrays of independent disks (RAID) to recover from media failure. If the installation is dealing with a data structure that is truly extreme in size -- such as the hundreds of terabytes that reportedly occur at Wal-Mart - then use a storage technology mechanism such as RAID 1 or RAID 5. In case of disk failure, the disk's contents can be reconstructed using redundant storage technology. However, note that this only addresses availability and data loss due to media (i.e., disk drive) failure. Other provisions must be made to address damage or loss of data integrity due to a rogue application or incorrect update. Design and implement a custom application to rebuild the data structure in case of loss of data integrity.
A combination of tactics to satisfy complex, real-world data center scenarios. In the real world, all five of these methods are combined as applicable. Partitioned tables are backed up in rotation, incremental backup is used against concurrent processes, and, for those structures that are too big to fit in the batch window, RAID 1 or RAID 5 is used. It is always necessary to have a backup. In some instances where the table or data structure does not ever change, the installation can save the input file so that the table can be restored in case of damage. Such a restore will be by means of an application rather than a formal database utility backup. However, the function is the same. The input file functions as a backup in this case. The cost and complexity are greater than if a database utility could be used, but still less than going out of business!
Reference:
Lou Agosta, Ph.D., is a business intelligence strategist with IBM World Wide Business Intelligence Solutions, focusing on competitive dynamics. He is a former industry analyst with Giga Information Group and has served many years in previous careers in the trenches as a database administrator. His book The Essential Guide to Data Warehousing is published by Prentice Hall. This article is © IBM. Agosta would like to hear from you, so please send comments and questions to him at LoAgosta@us.ibm.com.
|
![]() |
View Full Magazine Issue |
![]() |
E-Mail This Column |
![]() |
Printer-Friendly Version |
![]() |
Related Content |
![]() |
Request Reprints |
![]() |
![]() |
![]() |
|||||||
|
|||||||
![]() |
|||||||
![]() |
|||||||
![]() |
|||||||
Site Map | Terms of Use | Privacy Policy | |||||
![]() |
|||||||
|