
BM. Data de-duplication is one of the hottest technologies in storage right now, enabling you to radically reduce their spending on hardware by removing duplicate data. What are the key business benefits of this technology?
HB. There are a number of benefits. There’s the ability to use a lot less physical disk for your backup, so obviously there are huge savings there. And if you can back up more to disk, it means you can save on tape and tape management and other related costs. If you’re doing de-duplication between remote sites and you’re de-duping the data before it’s transmitted, then obviously you’re using less bandwidth to remove that data and there are cost savings associated with that. There are also facilities cost savings that can be factored in, too. Because you’re using a lot less disk, it means you can ultimately use less floor space to back up the same amount. And that can also have implications from a power and cooling standpoint: if you need less disk, that means there’s less disk to power and cool.
BM. What advances can data de-dupe technologies offer in terms of disaster recovery?
HB. The advantage of backing up to disk rather than tape is that your data is more readily available in a DR situation. You can recover it much more quickly, which can significantly reduce the amount of time that you’re actually down – and we all know the dollar-cost of downtime. The advantage of adding de-duplication on top of that is that you can also extend the retention periods for the data that you are keeping on disk.
So not only can you put more data onto disk, you can extend it to a month or two months or more, which obviously has implications from a DR perspective, because it has an impact on how quickly and how reliably you can get to that data in a disaster or an e-discovery situation. There are potentially huge financial costs to not being able to get data when you need to, whether from downtime or from a regulatory standpoint in terms of fines and so forth. It’s just a matter of being able to protect yourself by being able to get better access to that data.
BM. Companies are naturally wary of losing vital data that’s falsely deemed duplicative. Do you think is an issue, and how can companies implementing data de-duplication technology guard against this eventuality?
HB. There are data de-duplication technologies out there that use a hashing algorithm, and there is a risk (although it’s very, very minute) that there could be a hashing collision. That is a risk, and for some people that might not be acceptable. Again, it’s very tiny. There are other vendors that do it without a hashing algorithm where that’s not an issue. In conversations that we have with users, it is something that they’re aware of but at this point they’re generally rolling it out in applications where that wouldn’t be a problem.
BM. So it’s not something companies need to bear in mind when implementing a solution?
HB. We’re still very early on in the implementation of data de-duplication. I think everybody’s buzzing about it, but the reality is that it’s still very new, so it’s being rolled out much more broadly. As with any new technology, when people roll it out they do it in a phased rollout, so at this point the risk is very minute. And I think you’ll see that over time, these issues will be addressed as comfort levels with de-dupe increase. This comes through familiarity with it, and that comes through implementation.
BM. These tools and solutions are very much in the early formative stages, I guess. But in what ways are de-duplication tools evolving from point products to become features of broader data protection offerings?
HB. I wouldn’t say that they’re point products at all. First of all, data de-duplication isn’t a product; it’s a feature, and more and more people are offering it. Data Domain obviously has significant traction. Diligent has some traction as well, so there’s enough product out there, and I think over the next six months you’re going to see it much more widely available from the bigger name vendors, too. With the larger name companies really pushing it, that will, of course, lend further credibility to it.
BM. Finally, what do you think will be the key developments over the next 12 months? Are there any challenges that need to be overcome for the market to really take off?
HB. Over the next 6-12 months, you’re going to see de-dupe from the larger players, and that’s what’s really been missing so far. It’s really going to grow the market. I also think that you’re going to start to see data de-dupe as a technology being applied to more than just data back-ups. Already, we’re starting to see it being applied to archive data as well. Some vendors are offering the capability to de-dupe at your remote site before data is replicated, and then also be able to de-dupe across that data at the data center. So, I think you’re just gonna see the reach of data de-duplication technologies be extended and the breadth of its capability being applied across more data types, eventually even making its way into primary data.
We recently concluded some data protection research, and 15 percent of the people that we polled said that they were currently doing some form of data de-duplication. I think that’s a pretty high number given the fact the technology is pretty young. But, what’s interesting is that in the results that we saw, we asked people to identify or specify the actual type of data de-duplication, what technology they were using, and there were lots of different things listed. In other words, it wasn’t one vendor that was being listed. I think it just points to the fact that users are starting to really become educated about it. Even though Data Domain has the traction and is probably most closely linked with de-duplication, it’s still a wide-open market.