"The online business magazine at the heart of international business management news..."
New Account

The Magazine

Issue 13

E-magazine
  • Previous Issues

Blog

Where our team of editors discuss what they think about the current BM issues.

Seth Shaw
VP of Sales and Marketing - LogMeIn

Don't miss your connection!

Seth Shaw, VP of Sales and Marketing at LogMeIn, discusses how business travellers can stay connected during their travels
05 Jul 2010

A Deeper Look at Deduplication

Sepaton | www.sepaton.com

No Comments

Risk

Getting data to the safety of a backup environment quickly is your main objective and in some cases, your regulatory obligation. However, some deduplication technologies significantly slow down the backup process. These technologies, which perform the deduplication inline are efficient for the small data volumes backed up by SMBs or enterprise departments. However, inline processing cannot scale performance to support enterprise-class backups. You have to either split up your data among dozens of individual backup systems or backup only your most critical data to disk.

Slow inline backups put your data at risk in several ways. First, it keeps data in transit and vulnerable to loss or corruption for longer periods of time. Second, slow backups make staying within backup windows more difficult and may force you to backup data less frequently or possibly choose not to backup some data.

In addition, many deduplication technologies use a method called ‘hash-based comparisons’ to identify duplicate data. In rare instances, this technology can misidentify data as duplicate and fail to store it. This ‘hash collision’ problem goes unnoticed until you try to restore it.

A less risky method is to backup data at wire speed and then deduplicate the data using a more protective methodology. The system compares deduplicated files to the full un-deduplicated data as another data integrity check before any duplicate data is eliminated.

Whereas the inline systems are bottlenecked by their deduplication, the fastest enterprise class deduplication systems can back up at wire speed (more than 35 times faster).

Reassembly

Many regulations stipulate that you must be able to produce stored data quickly upon request. Fast restores are essential to keep employee productivity high. Typically, employees request restores of data that is less than 30 days old. Some technologies, such as hash-based systems, require significant time and processing resources to restore data because they have to reassemble numerous chunks of data stored throughout the disk system. Over time, restore performance gets significantly slower.

A much faster method, called forward differencing, stores the most current backup in its un-deduplicated form and uses it as a baseline for comparison. Older duplicate data is replaced with pointers forward to the baseline and new data is stored. As each new backup is performed, it replaces the previous backup as the baseline. As a result, this method can restore data as much as 10 times faster than hash-based systems and does not slow over time.

Human error

To reduce risk to your data, avoid technologies that require you to break up the backup environment into multiple discrete systems. Be wary of so-called ‘clustering’ as many vendors use this term to merely describe a user interface managing separate systems. This increases labor cost, administrative complexity and the likelihood of human error. Administrators have to ensure each system has sufficient capacity, the latest upgrades, the correct backup policies and so on. Because these systems function as individual units, deduplication can only be done within each system – an inherently less efficient methodology.

Scalable deduplication solutions let you start with the capacity and performance you need and to add more disk or processing nodes, as you grow. The system can be managed through a single, fully automated management console. A single administrator can manage 10s of petabytes of secondary storage with these systems.

Data deduplication is a powerful technology that is helping enterprises reduce risk, cut costs, and improve productivity. However, understanding the strengths and weaknesses of the various methodologies used to deduplicate data can ensure that you choose a deduplication technology that best meets your needs.


More like this...

Disclaimer: All comments posted in a personal capacity
POST A COMMENT
In order to post a comment you need to be regsitered and signed in.
Register | Sign in
No Comments Have Been Submitted
Disclaimer: All comments posted in a personal capacity