
Data deduplication is the hot topic for storage professionals right now. Many however feel overwhelmed with so much information that deciding how to implement it in their datacenters can be confusing. Janae Lee from the Quantum Corporation discusses the factors users should consider when they are choosing the right methodology.
When people ask about deduplication methodologies they are really asking: what backup problem needs to be solved? Is the goal shortening the backup window? Using the least possible disk? Replicating data as soon as possible? Reading data as quickly as possible? Different approaches to de-duplication provide different results, so users need to think carefully about their problems before they select a de-duplication solution.
Problem – Minimizing disk use: The approach that uses the least disk space deduplicates data during ingest. It can also replicate unique blocks during ingest, speeding up that process. The downside? The deduplication overhead – there’s always some – also occurs during ingest. Conventional in-line methods throttle backup progress under heavy deduplication loading while more advanced, adaptive methods, such as Quantum’s, use buffering to keep backup windows short.
Where to use it? Sites where capacity is at a premium, jobs where immediate replication is required, and, for adaptive systems, backup jobs with variable ingest, backing up several virtual machines at once, for example.
Problem – Keeping backup windows short: Users who want to finish their backups as rapidly as possible should utilize fully deferred deduplication. All data is written first to disk, and when the backup is finished, the data is deduplicated in a secondary process. With no deduplication overhead in the backup window, jobs finish faster. The tradeoff? You need enough disk space to hold at least one backup job, although some vendors even require space for two, and replication can’t begin until the deduplication starts.
When to use it? Classic use cases are database record backups, on-line transaction servers, and large backup jobs that struggle to complete during normal windows.
Problem – Accelerated restores: For applications that need higher speeds, it’s a good idea to keep one or more recent backups cached in native, non-deduplicated format, so that restores and tape creation can be faster. Post-processing systems make this easy, and Quantum provides the same feature for its adaptive system on the DXi7500. As long as space is available, the system retains recent data in a cache.
When to use it? Caching makes sense for tape creation jobs that are performance sensitive, and for jobs where large-scale or frequent restores are likely, such as email systems or heavily accessed databases.
Problem – Data sets that don’t deduplicate: Data deduplication makes disk backup much more effective and affordable, but it doesn’t work for all data. Some pre-compressed image files simply don’t deduplicate; some backup jobs with very high change or growth rates see limited value; and data that is not retained doesn’t benefit much from it.
When to use it? Likely applications include image processing, database re-do logs that are not retained and critical online transaction processing applications that require the fastest possible backups.
Problem – Using more than one methodology: This is the norm for backup environments as soon as the amount of data to protect starts to reach 10TB or so. Until recently, users had to compromise or buy multiple products, but newer disk backup systems give users a choice. Quantum’s new DXi7500, for example, offers policy-based deduplication, which lets users configure the system to configure all of the above policies to match data needs on a share-by-share basis. Capacities scale from nine to 180 TB, and the system even offers integrated creation of physical tape for long-term retention.
When to use it? This approach is perfect for larger midrange sites and data center settings, basically whenever flexibility, scalability, and performance are important.
Janae Lee is the VP of Corporate and Product Marketing at Quantum Corporation.