"The online business magazine at the heart of international business management news..."
New Account

The Magazine

Issue 14

From the death of Detroit and the future for a transportation network without oil to the management behind the Magic Kingdom: read our interactive magazine here.

E-magazine
  • Previous Issues

Blog

Where our team of editors discuss what they think about the current BM issues.

Seth Shaw
VP of Sales and Marketing - LogMeIn

Don't miss your connection!

Seth Shaw, VP of Sales and Marketing at LogMeIn, discusses how business travellers can stay connected during their travels
05 Jul 2010

Redundant Data

An Industry Insight by Equivio's Warwick Sharp

Equivio | www.equivio.com


Warwick Sharp discusses the elephant in the document review room.


“You could read 40% of the emails to cover 100% of the data. Bottom line: less information lets you make faster, better decisions.”
-Warwick Sharp, VP Marketing & Business Development at Equivio

For e-discovery vendors, de-duping is standard practice, so what’s new about managing ‘redundant’ data? The volume of duplicate documents in litigation discovery is dwarfed by the number of ‘near-duplicates’ – substantially similar versions that differ by a few words or paragraphs. Near-duplicates and email threads are a huge source of inefficiency in litigation processes, especially review. In many cases, document review now costs more than preparing the entire rest of the case. Reviewing redundant email messages and near-duplicate documents is a huge component of that cost.

Near-duplicates and emails are the proverbial elephant in the document review room. First-time users tend to be very surprised by the volumes of redundant data discovered. Redundant data in near-duplicates and email threads typically account for 30 to 50% of the materials to be reviewed – on top of the exact duplicates. The potential for cost savings is huge.

Suppressing data

Our approach is to expose and highlight the unique data. For example, let’s assume we’ve discovered a group of near-duplicate documents. Equivio software suggests a ‘pivot’ document within each set of near-duplicates that you should read first. If the pivot document is clearly irrelevant to your review, you can skip the remainder of the near-duplicate set, since the other documents differ by just a few words.

But what about the danger of overlooking critical documents? While a near-duplicate might have a small word change that could be crucial to the entire case, by identifying the redundant data, we expose the unique data. This actually reduces the risk of missing important documents. For example, you might review the pivot and decide that it’s an important document. So, you’ll need to review each of its near-duplicates. But you don’t need to read every version in full. Having read the pivot, you can simply use a redline tool to highlight the differences vis-à-vis the pivot. By zooming in on the unique information in each document, you have a much better chance of finding what you’re looking for.

The challenge with emails is to reconstruct the email thread structures, with all the sub-conversations and side conversations that are typical in email chains. Once you have the thread structure in place, the reviewer can just focus on the last message in each thread. From the software point of view, the trick is analyzing the content to verify that the last message does in fact contain all the previous messages in the thread. In the Enron data, for example, there are 517,000 emails. We found 205,000 emails that contained all the other emails. In other words, you could read 40% of the emails to cover 100% of the data. Bottom line: less information lets you make faster, better decisions.

The key players that benefit are the corporate litigants. By using the near-duplicate and email thread groupings, corporations consistently see a reduction in litigation review costs of 30 to 50%. By allowing bulk handling of very similar documents, both litigants and their outside counsel can be more confident in privilege logs and in representations they make in court about the documents.

Driving adoption

Corporations are driving the industry to think outside the box of traditional document review. In Equivio’s experience, when corporations see the cost savings at stake here with this technology, they tend to take a very proactive stance in making it a requirement for their outside counsel and e-discovery service providers.

Warwick Sharp is Vice President, Marketing and Business Development at Equivio. Warwick is one of the founders of Equivio, a provider of software for managing data redundancy, and a top 10 vendor in e-discovery. Warwick was previously Vice President of Marketing at Amdocs, the world leader in telecom billing systems.


More like this...