Where our team of editors discuss what they think about the current BM issues.

Warwick Sharp discusses the elephant in the document review room.
“You could read 40% of the emails to cover 100% of the data. Bottom line: less information lets you make faster, better decisions.”
-Warwick Sharp, VP Marketing & Business Development at Equivio
For e-discovery vendors, de-duping is standard practice, so what’s new about managing ‘redundant’ data? The volume of duplicate documents in litigation discovery is dwarfed by the number of ‘near-duplicates’ – substantially similar versions that differ by a few words or paragraphs. Near-duplicates and email threads are a huge source of inefficiency in litigation processes, especially review. In many cases, document review now costs more than preparing the entire rest of the case. Reviewing redundant email messages and near-duplicate documents is a huge component of that cost.
Near-duplicates and emails are the proverbial elephant in the document review room. First-time users tend to be very surprised by the volumes of redundant data discovered. Redundant data in near-duplicates and email threads typically account for 30 to 50% of the materials to be reviewed – on top of the exact duplicates. The potential for cost savings is huge.
Our approach is to expose and highlight the unique data. For example, let’s assume we’ve discovered a group of near-duplicate documents. Equivio software suggests a ‘pivot’ document within each set of near-duplicates that you should read first. If the pivot document is clearly irrelevant to your review, you can skip the remainder of the near-duplicate set, since the other documents differ by just a few words.
But what about the danger of overlooking critical documents? While a near-duplicate might have a small word change that could be crucial to the entire case, by identifying the redundant data, we expose the unique data. This actually reduces the risk of missing important documents. For example, you might review the pivot and decide that it’s an important document. So, you’ll need to review each of its near-duplicates. But you don’t need to read every version in full. Having read the pivot, you can simply use a redline tool to highlight the differences vis-à-vis the pivot. By zooming in on the unique information in each document, you have a much better chance of finding what you’re looking for.
The challenge with emails is to reconstruct the email thread structures, with all the sub-conversations and side conversations that are typical in email chains. Once you have the thread structure in place, the reviewer can just focus on the last message in each thread. From the software point of view, the trick is analyzing the content to verify that the last message does in fact contain all the previous messages in the thread. In the Enron data, for example, there are 517,000 emails. We found 205,000 emails that contained all the other emails. In other words, you could read 40% of the emails to cover 100% of the data. Bottom line: less information lets you make faster, better decisions.
The key players that benefit are the corporate litigants. By using the near-duplicate and email thread groupings, corporations consistently see a reduction in litigation review costs of 30 to 50%. By allowing bulk handling of very similar documents, both litigants and their outside counsel can be more confident in privilege logs and in representations they make in court about the documents.
Corporations are driving the industry to think outside the box of traditional document review. In Equivio’s experience, when corporations see the cost savings at stake here with this technology, they tend to take a very proactive stance in making it a requirement for their outside counsel and e-discovery service providers.
Warwick Sharp is Vice President, Marketing and Business Development at Equivio. Warwick is one of the founders of Equivio, a provider of software for managing data redundancy, and a top 10 vendor in e-discovery. Warwick was previously Vice President of Marketing at Amdocs, the world leader in telecom billing systems.