"The online business magazine at the heart of international business management news..."
New Account

The Magazine

Issue 10

E-magazine
  • Previous Issues

Blog

Where our team of editors discuss what they think about the current BM issues.

Daniel C. Jones
Editor

Learning from Toyota's mistakes

Over the past two decades Toyota have set the standard in manufacturing. So what can be learnt from the car giants recent crisis?
09 Mar 2010

Balancing the Need for Rapid Recovery Against the Cost to Get There

By Camberley Bates and Dave Bechtold

FalconStor Software | www.falconstor.com

No Comments

Adequately protecting your organization's prime data and applications from some type of disruption is a lot like purchasing business or personal insurance. By Camberley Bates and Dave Bechtold, FalconStor Software

You know you'll be glad you paid the premiums in the event you ever need to file a claim. Still, many organizations have found themselves struggling to balance their need for fast disaster recovery against the typically high costs that have been associated with technologies that transmit data across town or across the country.

Traditionally, when you copied (or replicated) data to a remote site, it meant purchasing exact replicas of both server and storage hardware at Site A and Site B. Replication software available within high-performance storage systems often required installing storage from the same top-dollar vendor at the remote recovery site as well. Having storage systems from more than one vendor just added to this expense—with more licensing fees associated with each vendor's unique replication software. Once you threw in the price of WAN transmission and professional services needed to manage all of this complexity, the bottom-line figure could make even the most stalwart CIO blanch.

In the face of such expenses, some organizations decided to trade off their need for fast recovery for what they perceived to be a more affordable option of shuttling backup tapes off-site. While cumbersome, risky, and a much slower restore proposition, tape backups were at least a more affordable alternative to protect and secure company data.

Exploring Recovery Services for Today's Service-Oriented Architectures
The evolution of a new IT services layer to meet changing business needs has now begun to offer more affordable choices for rapid recovery. Evolving local and remote replication services available from within the network "fabric" enable organizations to apply one common software platform and data replication process independent of the underlying, heterogeneous storage hardware or network connectivity in place to store or copy data.

No longer needing "like-to-like" storage hardware at Point A and Point B, the replication services-oriented approach to off-site disaster recovery becomes more appealing when combined with lower-cost "Tier 2" or "Tier 3" Serial ATA (SATA)-based or legacy storage systems. While not necessarily a fit for all applications, this lower-cost alternative to tape-based backups can greatly improve recovery speeds, availability, and access times for many organizations.

Tackling the High Cost of Bandwidth for Disaster Recovery
Organizations are also reducing the high cost of WAN bandwidth with replication technologies that significantly reduce the amount of data transmitted "across the wire" by first eliminating redundant data found since the last time replication occurred. In one example, a large financial institution anticipated it could reduce the amount of data copied by 80% prior to WAN transmission, using FalconStor's Replication tools with redundant data elimination technology. This could result in their saving as much as $16 million dollars in one year's WAN costs alone.

Besides significantly reducing WAN transmission costs, this type of redundant data elimination process can also reduce the cost of disk storage at the remote site. One national law firm was able to save on both bandwidth costs and storage space by remotely replicating image copies of its servers using this type of FalconStor technology. The law firm went from using 20 TB of more costly, Tier 1 storage at the remote site to using 2.5TB of more affordable Tier 2 storage. This saved the firm 85% on its remote storage space and another 85% on its network bandwidth.

To help in determining how to best integrate such recovery services into your existing infrastructure, there are several basic recovery management issues to keep in mind before, during, and after a disaster. Using this type of process-driven approach, many organizations will find it easier to identify the best fit for their top-down recovery needs.

Considerations Before a Disaster
While many organizations think about remote recovery, they may overlook other more localized disasters that could seriously hamper revenue or day-to-day employee productivity. What types of disruptive events should you plan for – from a rolling disaster or isolated data corruption event to a more widespread act of God? Should your DR plans also include ways to achieve a more granular, rapid recovery if someone inadvertently deletes the wrong file or e-mail? While not a widespread disaster, such an event could still have significant implications to an individual, and the company, if not corrected in a timely manner.

Taking an application-by-application look at recovery management, some areas to consider before a disaster include:

  • The impact of an application going down for a few minutes or several hours. By looking at the associated monetary costs (lost revenues, lost productivity, lost relationships with customers or partners, etc.), you can usually determine how fast your organization will need to restore the application and data once it goes down. This is known as a recovery time objective (RTO).
  • The cost of losing – and recreating – application data after a disaster. If your organization performs nightly backups to tape and the SAP or e-mail application goes down at 12:00 noon the next day, you may lose over 12 hours of transaction data during the restore process that could involve lengthy log restore and rebuild processes just to catch up with the lost time. If this takes too long and causes too much negative impact on the company, you may decide upon a shorter recovery point objective (RPO). The RPO indicates the roll-back point on the timeline from which you will need to recover. Some critical applications may require a close-to-zero RPO, while others can withstand losing some of their more recent data in the short-term while recovery efforts are underway.
  • Ways to lower costs associated with remote disaster recovery. As indicated earlier, companies should be exploring the use of cost-saving recovery technologies before a disaster strikes. Many companies who would never have considered the cost of a remote DR site before are now investigating the use of older, legacy storage systems at the remote site, SATA storage, and a single replication software layer that works on top of underlying, heterogeneous storage systems. They are also exploring the use of redundant data elimination technologies that can significantly reduce WAN transport costs and the remote storage footprint.
  • Performing yearly DR tests. Testing the effectiveness of a DR system in an annual "mock disaster" is an obvious and critical step that is often overlooked by organizations for fear of disrupting daily operations. Some recovery approaches offer a simpler, faster way to test that won't disrupt production time and may take just a few hours, instead of several days. Investigate your proposed DR approach to determine how easy or difficult it will be to test on an annual, biannual, or quarterly basis.

Considerations During a Disaster
Three main areas to keep in mind when things go "bump" are speed, simplicity, and data consistency.

  • Speed. Speed is essential when a disaster strikes. Everyone wants to know how fast you can get the application or data image back up. You can get back up faster if you use technology that allows you to "promote" your secondary site to primary site status, until the primary site becomes available. If your plan involves a potential application server rebuild in order to prevent data corruption, this may take an hour or several days. You may want to reset the RPO/RTO expectations assumed before the disaster or look at recovery technologies that ensure data consistency and safeguard against potential data corruption and rebuilds.
  • Simplicity. Simplifying the data recovery process cannot be emphasized enough. During the heat of stressful events, tasks become more complicated to perform. Where will the recovery process be documented for easy access from another location? In the event a more senior IT contact is not available, will recovery be easy enough for other staffers to follow in a few steps? In a more isolated incident involving deletion or minor corruption of end user data, look at how easy it is for IT staff or the end user to restore just the corrupted or deleted subset of data, without impacting other users.
  • Data consistency. Many modern-day, transactional applications have a lot going on at any one time – not all of it is neatly stored on a set of disk drives at the particular time a disaster strikes. Current transactions may be partially tracked in the server's cache memory or log file before they are fully "committed" to disk. This poses special data consistency issues when it comes to quickly recovering them. Depending on the type of replication process in use, IT staff could spend valuable extra minutes or hours rebuilding servers after a disaster in order to ensure data integrity.

Replication technologies come in different flavors. Investigate the level of "application-awareness" replication software brings to clean up these data integrity issues prior to replication, thereby preventing the prospect of costly server rebuilds. You should also explore whether or not your can identify and replicate a logical "consistency group" of application data that spans multiple storage devices – so that the replicated snapshots of your environment preserve data interrelationships.

Considerations After a Disaster
After the disaster, you need to focus on how fast you can return to normal operations. Assuming the data center is still in place, you need to look at how quickly your staff can return the primary data center to service. If you promoted the remote site to primary status, what steps are now involved in updating the original data center with any data changes since it went down? How efficiently can you move the interim data changes back to the primary site? How much will it cost, and will the process take days or weeks to complete?

By balancing these types of recovery objectives around speed, simplicity, data consistency, and cost, you should be able to engineer a DR plan that is both affordable and robust enough to keep your business afloat during disaster—while simultaneously meeting your most urgent and flexible recovery needs.

About the authors: Camberley Bates is the Chief Marketing Officer and Dave Bechtold is a Senior Storage Architect at FalconStor Software. FalconStor is the leading provider of disk-based data protection solutions, including their market-leading VirtualTape Library (VTL), that transform traditional storage paradigms. They offer a comprehensive set of data protection solutions for a wide range of RTO and RPO requirements. For more information, go to www.falconstor.com.


More like this...

Disclaimer: All comments posted in a personal capacity
POST A COMMENT
In order to post a comment you need to be regsitered and signed in.
Register | Sign in
No Comments Have Been Submitted
Disclaimer: All comments posted in a personal capacity