Alternatives to Tapes on Trucks
Bank of America, Time Warner, Citigroup, Ameritrade, City National Bank and CitiFinancial: this year alone, these companies made headlines over the disappearance of tapes containing highly sensitive customer information. Lost tapes are only one small part of the problem (media failure, human error, etc). Understanding all the risks of tape backup as well as today's new, improved alternatives may cause you to rethink your entire data protection strategy.
Trouble with Tapes
Shipping backup tapes to offsite facilities has always been an important part of backup and disaster recovery (DR). Financial tapes, often containing names, social security numbers and account numbers, are especially high risk when the data is unencrypted. In April of 2005, offsite storage specialist, Iron Mountain, after admitting four instances of lost tapes this year, recommended that customers begin encrypting backup tapes. However, a recent Enterprise Storage Group (ESG) survey of 388 storage professionals said 60 percent are not encrypting to tape.
Loss of tapes is not the only problem with tape backup. The scary truth is that you never know if a backup tape is good until you really need it. Diogenes Analytical Laboratories, an IT advisory company that performs independent product lab evaluations and advises IT buyers, estimated that, on average, between 5 and 20 percent of nightly tape-based backup/recovery jobs fail. According to ESG, roughly one-fourth of SMB users reported that 20 percent or more of their tape-based backup/recovery jobs fail. The number one reason cited: media failure (e.g., lost, damaged or corrupted tapes).
In addition to media failure, there is the high percentage of operational errors, including operator (human) errors such as storing the wrong tape, and procedural errors such as backing up wrong or empty files. Any time there is human intervention, the opportunity for error increases dramatically.
Analysts generally agree that IT management costs represent five to seven times the cost of capital expenditure. It is therefore essential to consider operation and administrative costs (including media management and tape swapping) in addition to acquisition costs when doing a financial analysis of tape versus alternatives. For large organizations, the tape media costs (which generally come out of the expense budget versus capital budget) mount up as well.
Another reason for losing sleep at night is that regulatory compliance (e.g., Sarbanes-Oxley (SOX), SEC 17a, HIPAA, Patriot Act, Freedom of Information Act, etc.) requires data to be readable long into the future (with differing time requirements for various regulations). Examining the track record of vendors on longevity of tape format support does not paint a pretty picture.
Compliance is also contributing exponentially to the growth of data as more records are generated, more regulations are created in more industries and as we move further into the retention periods in year/decades. Including compliance and other factors, analyst predictions for data growth range between 50 and 100 percent growth per year. In addition, analysts currently estimate an average of seven copies of critical data being maintained online.
Growth of data eventually forces users into a decision on whether to spend money to upgrade tape backup equipment (new tape formats, more or faster tape drives) or to spend money on other alternatives. For example, one user backing up three terabytes needed to upgrade their tape library to add more drives because their backups were taking every night and all weekend. This caused them to do a financial analysis of tape versus disk backup. Another user with 5,000 tapes was considering going to AIT format (twice the density and speed). Again, the increase in spending prompted a hard look at their options.
With disaster recovery continuing to be one of the top IT priorities identified by users, recovery time objectives (RTO), or the amount of time it takes to recover from a disaster, may be the final driver in evaluating alternatives to tape. If your RTO is measured in minutes rather than hours, it is time to look beyond tape.
A New Paradigm for Local and Offsite Backup
Tape has been the mainstay of backup/recovery for the last twenty-five years for one main reason - cost (per megabyte to gigabyte). With the advent of inexpensive ATA and Serial ATA disk, the cost formula changed. While not providing the same level of reliability, SATA was good enough and cheap enough to start a disk-based backup revolution, starting a paradigm shift with dramatic improvements in speed and reliability of backup and restore.
For a simple cost comparison, W. Curtis Preston offers this scenario: a midrange tape library costs roughly $4 to $11 per gigabyte (GB) while disk prices are hovering around $3 to $11 per GB. This puts disk and tape about even (excluding reliability and recovery time factors).
Perhaps the most significant change in cost comparison comes from the introduction of a new technology which some have named capacity optimization (CO). Also called data reduction, disk deduplication, or commonality factoring, this approach is based on the ability to massively reduce data down to its smallest possible size, in the amount of bytes stored or transferred over the network. CO goes far beyond traditional compression techniques which generally yield compression ratios of 2:1. CO technologies today from some vendors have shown data reduction ratios of 20-25:1 and more in real customer environments.
CO works somewhat like traditional compression techniques, but it goes one big step further. Traditional compression techniques identify repeating characters or simple patterns and represent them using that character or pattern and the number of repetitions. With CO, a data object such as a file or a network transmission is analyzed and broken down into pieces, or chunks. Any unique chunks are identified and stored only once, along with instructions for where repetitions of those chunks are to be reassembled into the data. Any repeated chunks (e.g., a graphic, a commonly used paragraph, a section of source code, etc.) will only be stored once, along with instructions for where it is to be reassembled. Thus, duplicate files require nothing but instructions. In addition, when data changes, only new nonunique parts are stored, along with any new instructions (as opposed to storing an entirely new file on each backup). Since a large majority of day-to-day operations involves only small changes relative to the size of the data, this means that over time the efficiency of CO increases, often dramatically.
CO technology is available today both for disk-to-disk storage and data reduction over a network. The resulting data reduction factor of 20 to 25:1 (depending on the vendor) coupled with the lower cost of SATA disk creates an entirely new paradigm for disk-based backup and remote replication.
Disk to Disk
Disk-to-disk (D2D) backup comes in a variety of flavors. Virtual tape libraries (VTL) are disk subsystems that emulate tape libraries to the backup software. Another category of D2D products that has become increasingly popular operates as "disk as disk," based on either JBOD or RAID arrays or a NAS Filer. All of these products involve some type of disk subsystem, generally an ATA-based disk array. It important to note that only some of these products include CO technology.
While many argue whether there is a need for tape to play any role in backups, current implementations still typically involve tape somewhere. Some D2D products are tightly integrated with tape drives and libraries, and are referred to as disk-to-disk-to-tape (D2D2T). Other products are D2D only, and leave it to the user to decide about backups from the secondary disk to tape. As part of the data classification process for information life cycle management (ILM), it is worth considering whether there is a class of data that you still want backed up/archived to tape (at least for now).
Another new D2D technology getting attention is continuous data protection (CDP), where backups are done continually as any change is made to the data. This granularity allows restores to be done back to any point in time.
Replication Over the Network
D2D options, particularly with the 20:1 data reduction potential that comes from CO, can provide an excellent solution for local backup and recovery. However, DR and regulatory compliance also bring the need for offsite backup, often with very specific geographic distance requirements. CO offers a new paradigm for offsite backup/DR, as well. By applying the same 20:1 data reduction factor to replication over the wide area network (WAN), CO can greatly reduce the network bandwidth requirement for offsite replication. D2D offerings that implement CO replication between appliances across the WAN bring a new alternative for meeting your offsite DR requirements without shipping tapes.
Real World Trends and User Experiences
What does all this mean in the real world? How much are users spending, overall, on backup, archive and replication? Where is the industry today in terms of D2D storage and replication adoption and spending?
IDC predicts that the backup, archiving and replication software market will grow from $4.3 billion in 2003 to $6.58 billion by 2008, representing 54 percent of storage software expenditures.
According to ESG's March 2005 study, 18 percent of respondents have replaced their tape libraries with disk-based alternatives, and another 58 percent would consider it. Eighty percent of respondents stated they would begin to replace some of their existing tape infrastructure with disk within 24 months, with 40 percent saying they would do so within 12 months.
In a purchasing intentions survey done in March by Storage Magazine, 54 percent of the respondents were increasing their D2D spending. Also, for the first time, remote replication spending surpassed offsite tape spending as the primary focus for DR. Of those already deploying D2D, 48 percent were using D2D2T (backup to disk then tape), 29 percent were using D2D then archiving to tape, and 12 percent had implemented a VTL.
As to continuous data protection, 21 percent have implemented some form of CDP. Another 10% will be implementing this year, and 41 percent area evaluating or planning to evaluate CDP.
On the anecdotal side, the stories from D2D users often sound to good to be true. The compression/data reduction ratios that many users are experiencing exceed vendors' claims (15 to 20 times compression, even up to 100 times in rare instances). Users talk about how great (and how rare) it is to implement something that meets or exceeds their expectations.
To understand the reality of this new paradigm, here is a summary from one extremely satisfied user of a leading D2D solution (from Data Domain) which implements CO for both disk storage and replication:
"Our implementation has been a success story all around. It's one of the few that met all our expectations. Restores are now so fast that my users think it must be really easy, and they've started taking us for granted. We are now keeping 2 to 3 months of data online. No more worrying about backup windows - we tape out during the day. We've greatly reduced our backup and restore times. Also, we do a fair amount of snapshots and have been able to reduce the amount because we can get the data back so quickly. With the reduced tape costs, Iron Mountain storage costs, and low cost of units, we've seen a payback in six months."
With all this in mind, here are some suggested considerations when evaluating D2D technologies:
- As part of your ILM and DR evaluations, consider a risk analysis of tape as a backup/DR medium. Consider the implications of tape versus disk to your RTOs and recovery point objectives (RPOs). What role, if any, do you want tape to play in your strategy?
- As your data growth requires additional expenditures for your existing tape backup, consider a financial analysis of tape backup versus disk backup spending.
- When doing the financial analysis of tape versus disk, include administrative costs as much as possible (tape and media management). Also include an estimate for CO data reduction ratios in the cost per gigabyte for D2D.
- In evaluating new D2D solutions, consider how the new solution will integrate with your existing backup software. Does it require replacement?
- Does the offering implement CO for local D2D backup?
- Does the product offer network replication over the WAN for DR? Does the replication function implement CO?
- What CO ratio is advertised by the vendor? What are users actually experiencing?
While the age-old question, "Is tape dead?" is still being debated, it is clear that changing cost factors of ATA disk coupled with 20:1 data reduction factors of CO have created a new paradigm for disk-based backup and recovery. At a minimum, it is entirely possible to create a reliable backup and DR strategy utilizing D2D technology, with tape moving to an archival-only role (some avoid even that). As with any IT paradigm shift, there will be those who will immediately jump to the new way and those who will continue with tape in varying roles for years to come. In any case, it is clear that evaluating D2D with CO should be part of your very near future.
For more information on related topics visit the following related portals...
High Availability/Disaster Recovery and
Barb Goldworm is an independent analyst and consultant with over 25 years experience in the computer industry in systems and storage management, in various technical, marketing, industry analyst and senior management positions with IBM, StorageTek, Novell, Enterprise Management Associates and other successful startup ventures. She has been a frequent speaker at industry conferences worldwide for over 10 years and was the creator and track chair for the Networld+Interop track on Networked Storage. She has been a regular columnist and contributor for various trade publications including NetworkWorld, ComputerWorld and Storage Networking World Online, as well as being frequently quoted in the press. She also provides consulting to the institutional investment community, advising clients on business issues and trends occurring within the high tech industries. Goldworm can be reached at email@example.com.