The Elements of Business Continuity
The ability to maintain production and general system availability, under even the most extreme circumstances - whether an electrical outage strikes, natural disaster hits or routine system maintenance is required - is a key component of business continuity today. One needs only to recall the recent "Great Blackout" and its associated costs to begin to grasp the importance of this issue. Most businesses can afford little, if any, downtime. So when the plug was pulled last year on thousands of businesses across the Northeast and Midwest, why were so many organizations unable to maintain business as usual? The answer to this question lies in the complexities surrounding what is required to ensure true business continuity, which involves addressing the various threats to business continuity that organizations are faced with today with appropriate solutions.
Perhaps the only value businesses can derive from the Great Blackout is the examination of how they fared, and the application of this experience to their business-continuity efforts. A recent survey by Mirifex, an Ohio-based business-technology consulting firm, Case Western Reserve University and CrainTech, an online publication based in Cleveland, provides some insight into the losses due to the blackout, and the state of business-continuity preparedness today. Results of the survey include:
- More than one-third of the respondents (34 percent) have no risk management or disaster recovery plans in place.
- Two-thirds of the respondents (66 percent) lost at least a full business day due to the blackout.
- A quarter of the respondents (24 percent) lost more than $50,000 per hour of downtime, and four percent lost more than $1 million for each hour of downtime.
- Nearly half of the respondents (46 percent) said that lost employee productivity was the largest contributor to losses suffered due to the blackout.
- Nearly half of the respondents (46 percent) will invest more in risk management, business continuity and/or disaster recovery in the future.
The first piece of equipment you should have to combat blackouts and power surges is an uninterruptible power supply (UPS) for all servers. Uninterruptible power supplies are not expensive and provide temporary power to either gracefully bring down the servers during a blackout or to transfer from utility power to an on-site electrical power generation system.
The most effective solution for a blackout is, for obvious reasons, an on-site electrical power generation system. If this is not practical for your organization, and if you have remote offices outside of the affected area that need to access servers at your location, off-site redundant systems are your best alternative.
Off-site redundant systems won't turn the power on at your location, but they will provide failover to a secondary system at a remote location unaffected by the blackout. This allows users at other locations to access the same applications and data they normally access at your location. You will still lose productivity at your location, but employees at other locations will remain productive. Redundant systems mitigate and localize productivity loss in the same way that blackouts localize power loss.
Off-site redundant systems can be implemented in many ways, depending on the needs of your organization. They may be located at one of your remote offices or at a collocation facility. They may synchronize data from your primary system in near real time, or the primary system may periodically detect and save recent updates and send the updates as batch files to the redundant system for synchronization.
The first step toward implementing a redundant system is to determine which applications and data are critical to run your business in the event of a blackout or other disruption. This is also the first step in developing a business continuity plan for your organization, which you should also have. Then, you need to determine the time window that your organization can tolerate to execute the failover, which may include the time required for data synchronization of batch files recently uploaded to your secondary system. The final steps include installing the critical applications and data on your secondary system, configuring an identical environment to your primary system and scheduling data synchronizations.
You also need to consider your tolerance for temporary data loss. This particularly applies to redundant systems that store updates in batch files on the primary system and periodically upload them to the secondary system. The blackout or other disruption may occur before the primary system backs up recent updates and uploads them to the secondary system. In this scenario, the secondary system cannot obtain and synchronize the most recent updates until the primary system is back online. To minimize this temporary data loss, you should schedule incremental backups, which store only data changed since the last incremental backup, at intervals as short as possible.
Following a failover operation, when the primary system is again available, redundant systems are able to fail back to the primary system by synchronizing the changes that occurred while users were accessing the secondary system. The secondary system detects changes, saves them, and uploads them to the primary system, where they are synchronized. Users are then transparently redirected to the primary system, simply by changing the IP address that their computers use to locate the system.
The key to maintaining a redundant system is to verify that synchronizations are occurring accurately and as scheduled, to periodically test failover and failback procedures and to train personnel in several locations to perform them.
While redundant systems may help reduce the cost of a blackout for your organization, they are even more appropriate for other threats to business continuity, such as equipment failure, massive data loss or routine system maintenance. These events occur at every organization regardless of size, complexity and geographic distribution. The ability to fail over to a secondary system when the primary system or its data are temporarily unavailable maintains productivity throughout your organization.
Another form of redundant systems can and should reside on your primary and secondary systems. Disk mirroring provides a mirror image on a secondary hard disk of the data on your primary hard disk. If the primary hard disk or other hardware, such as a controller that communicates with the primary hard disk fails, the system detects the hardware failure and automatically fails over to the secondary hard disk. Without disk mirroring, you would either have to fail over to a secondary system or wait long hours for the installation of a new hard disk and restoration of its data from backup. Following the restoration, users would need to reenter changes that occurred since the last backup. This results in lost productivity and potential inaccuracies in reentering changes.
At a level below redundant systems, but equally important, you should, and probably do, have some form of backup. Backup is just what you need when your system is available, but several files were lost, corrupted or accidentally deleted. There is no need to fail over to a secondary server just to restore a few files.
Although backup represents the birth of business continuity, it has evolved along with the complexity of modern enterprise software. The problem that initial, traditional backup solutions solved was intended for disk crashes, whereby maintaining copies of information seemed adequate. The assumption was that if you had several copies of the same information stored in different locations, the information contained inside these files was safe. However, there was no ability to check data quality or data integrity. The system only backed up and restored the same good or bad data.
Such traditional backup solutions are less effective for today's enterprise systems, which are built on the integration of enterprise applications (software that makes information easier to work with), along with intellectual capital (the automation of processes), while leveraging relational database management systems (RDBMS) that enable relationships and dependencies to be created and maintained among documents (or objects).
Today's enterprise systems create a new requirement for backup solutions to understand, backup and restore not only data, but also data relationships and the interdependencies within a business process. A modern business-continuity system should not only understand the data relationships but also check for data integrity, in order to ensure that the relationships are intact and repaired prior to backup. It should also be able to incrementally back up and restore modified or selected objects, rather than requiring restoration of the entire system only to recover several files. A final consideration is the impact on end users. If they must log off the system in order for a backup or restoration to occur, it results in lost productivity and frustration.
While the Great Blackout grabbed the headlines and focused our attention on business continuity among other issues, another threat to business continuity of a more insidious, far-reaching and frequently occurring category than blackouts occurred during the same week and received far less attention. The Blaster worm infected more than 1.4 million computers in four days, and it was followed by variants and the worst worms in history, Sobig and MyDoom. Blackouts, physical attacks and natural disasters are destructive at many levels, but they are fewer and further between than the greatest threats to business continuity and information assets: cyber attacks and information theft.
Consider these findings from a 2004 computer crime and security survey of 494 organizations conducted by the Computer Security Institute and the FBI:
- Total annual losses due to computer crime amounted to $141,496,560.
- Theft of proprietary information caused the second greatest financial loss, after denial of service.
- The sources of the attacks were equally divided between internal and external attacks.
The CERT Coordination Center, which has been monitoring computer-security incidents since 1988, reports a steady and dramatic increase of computer attacks, from six in 1988 to 137,529 in 2003. Unfortunately, the increase in reported attacks has been accompanied by an increase in reported vulnerabilities, from 171 in 1995 to 3,784 in 2003.
Furthermore, according to a June 2003 presentation to Silicon Valley executives by the Internet Security Alliance, the tools used to create attacks are increasingly sophisticated and widespread, while they require less technical knowledge to operate. The presentation also includes cost estimates for clean up and lost productivity due to the following viruses:
- Klez: $9 billion
- Code Red: $2.6 billion
- Love Bug: $8.8 billion
- Nimda: $1.2 billion
- Slammer $1 billion
The defenses against viruses include anti-virus software and continual updating of its virus-definition tables, installation of software patches that fix vulnerabilities, educating your users about identifying suspicious e-mails and exercising caution before opening e-mail attachments.
The defenses against network intrusions include firewalls, intrusion-detection software and strong passwords that are not words and contain at least one punctuation mark or other symbol, such as an asterisk.
Today, business continuity is expanding beyond backup, redundant systems, virus protection and intrusion detection. It now also incorporates protecting an organization's confidential information from accidental or intentional misuse, which may result in breaches in compliance, loss of productivity, competitive advantage, shareholder wealth and possible fines and litigation.
Some companies do not put confidential documents in their enterprise systems because there is no way to protect the documents from users and administrators. There is even concern about authorized end users who may need confidential information in order to perform their jobs. The main problem is that, while access to systems and content may be restricted via user authentication, there is no way to restrict what end users do with the information following access.
The problem also extends beyond the organization as companies work together to achieve business goals such as mergers and acquisitions, and outsourcing for strategic projects. The information shared with outside parties for these purposes is among the most strategic and valuable to an organization and, until now, there was no way to control their use of the information while maintaining the ability to collaborate.
Secure collaboration, the newest addition to the business continuity landscape, addresses the need to encrypt and protect content from internal and external users while enabling collaboration, the most fundamental requirement of business. Secure collaboration systems encrypt content, restrict content access via authentication, and provide granular post-access control over activities such as cutting and pasting, printing, saving and forwarding via e-mail.
The criteria for evaluating whether a secure-collaboration solution should be part of your business continuity solution are simple: You should consider secure collaboration if you have sensitive, confidential data that is a part of the business process, must be used for collaboration internally or externally, or that must be stored or accessed according to privacy laws such as the Gramm-Leach-Bliley Act and the Health Insurance Portability and Accountability Act (HIPAA).
A final boon to business continuity is the recent flurry of laws enacted to combat corporate malfeasance, terrorism, identity theft, misuse of personal information and to ensure privacy and portability of healthcare information. Compliance officers are well aware of the high-profile regulations facing them now that have a major impact on information systems, such as HIPAA, SEC Rule 17a-4, NASD Rules 3010 and 3110, the Gramm-Leach-Bliley Act, the Sarbanes-Oxley Act, the USA PATRIOT Act, the California Security Breach Notice Law and the Basel II Accord. While each regulation is focused on a specific area, such as healthcare records and each requires specific applications for data processing and compliance monitoring, they all have this in common: They dramatically increase storage, business continuity and information security requirements.
For example, Sarbanes-Oxley, by requiring all public companies to certify financial reporting and internal controls, places great importance on version control to document the evolution of financial reports and their supporting data and to ensure that final documents really are the final versions. These requirements are prompting banks and other businesses to implement enterprise content management (ECM) systems with built-in version control and audit trails indicating who accessed the content and when, and what changes were made. ECM systems may also include configurable workflows that automate content-review and approval procedures and help to ensure compliance with internal processes and controls.
While ECM systems aid in compliance with laws such as Sarbanes-Oxley, they require a considerable storage allocation in order to store all versions of the content as well as the audit trails, workflows and relationships to other content items in the system. The storage requirements don't end there, because all critical data that is stored must also be backed up and again stored in a separate location.
SEC Rule 17a-4, NASD Rules 3010 and 3110, and Basel II place an even greater requirement on storage and backup systems. SEC Rule 17a-4 and NASD Rules 3010 and 3110 require financial-services firms to supervise and record all electronic communications related to their business for a minimum of two years, while the Basel II Accord requires banks to archive two years of data to prove that they maintain minimum capital adequacy to cover their financial exposure.
The storage and data-retention requirements of these and other laws created a need for archiving, the final category of business continuity. Archiving involves moving data at the end of its life cycle from more expensive primary storage to less expensive archival storage media. It also usually involves protecting data from modification. The same requirements for modern backup systems also apply to archiving: The archiving solution should understand and verify data relationships, and the archived data should be readily available if needed. Archiving plays an important role in complying with governmental mandates for information storage. If regulators ask for content that must be stored over several years and an organization cannot produce it, it may seriously impact an organization's ability to do business.
Business continuity and information security will continue to gain notoriety in the form of general awareness, high-level meeting agendas, and media coverage, and they will maintain their new place as one of the top priorities for national, organizational and even personal security. Information security is now even a household concern, due to the widespread adoption of personal computers and Internet access, and the rampant incidents of viruses, fraud and identity theft. Now that the threat is real and it affects every one of us, business continuity and information security will only continue to improve. There is little we will ever be able to do to stop someone from attempting a cyber attack, or to stop all disasters from happening, but there is much that we can and will continue to do to dramatically reduce their disruption and damage to business, the economy and our lives.
For more information on related topics visit the following related portals...
High Availability/Disaster Recovery and
Elaine S. Price is cofounder, president and CEO of CYA Technologies, a leading provider of business continuity and secure collaboration solutions. She has 20 years of experience in high technology with positions including sales, marketing, development and ownership. She is a frequent speaker at industry trade shows and is regularly quoted in articles on business continuity, secure collaboration and success in business.