Using Analytics for Pre-Review Data Reduction

According to RAND, Review typically accounts for about 73 percent of all eDiscovery production costs.

Technology has changed the way we work and live in virtually every other aspect of our lives.  So how can technology help reduce discovery production costs?

Quantum’s initial array of technologies work alongside Office 365 and other mail archiving environments, bringing fast indexing, complex iterative search capability and reporting to bear upon the reduction effort.   We are also able to (very inexpensively) pass reduced copies of original data along to the review stage of the eDiscovery process.

But even after applying traditional metadata filters and key search terms in an iterative fashion (which in our experience reduces the data by an average of 93-94%), a substantial number of non-relevant documents always seems to remain.

The warning here is that once documents have been put into a review platform, eDiscovery costs immediately escalate.  Here are some of  the fees that kick in right away:

 – Review vendor hosting fees

 – Attorney review fees

 – Premium data storage fees

Purveyors of rigid, assembly line-style approaches to eDiscovery (that do not apply the necessary technical expertise need to defensibly reduce the data further before putting the documents into a review platform) will eventually find themselves at a competitive disadvantage at the corporate level, because corporations typically operate on fixed budgets and are more likely to form relationships with vendors who can help them solve the costly problem of sending tens of thousands of non-relevant documents out for attorney review.  Defensible and objective culling out of non-relevant documents before moving them to a review platform further smooths out the spikes in discovery costs for budget-driven corporations.

The technical challenge is applying objective, defensible methods so that the remaining non-relevant documents are significantly reduced before the documents are put into a review platform.

While there is no “silver bullet” technology that can achieve this in each and every case, we select from an array of technologies that can perform the following objective tasks quickly – before  the documents are put into a review platform (in conjunction with guidance from Counsel, of course):

 – Identify non-relevant clusters of documents

 – Identify non-relevant date and non-relevant time periods

 – Identify non-relevant senders

 – Identify non-relevant domains

 – Identify e-mail with visually-similar attachments

 – Identify e-mail that are To and/or From specific custodians

Office 365 eDiscovery E-Mail Search Limitations

357c90_0ee052469a8f4b6e918e0897e174fadeOffice 365 is enjoying continued success – driven partly by the popularity of Microsoft Outlook.  Outlook, according to the Radicati Group, is the most popular business email client on the market, with a 60% market share in 2015. For years to come, Office 365 will be used by companies to conduct the electronic discovery of e-mail.

Given the significance of Office 365, what are Office 365’s e-mail search limitations from a discovery perspective?  What do users need to be aware of when conducting litigation, internal investigations and regulatory compliance?


The eDiscovery cases page in the Office 365 Compliance Center is where cases are accessed and managed.  Here, users can identify relevant e-mail and place litigation holds on sources such as SharePoint and Exchange (Exchange is the Office 356 mail store).  Holds can be placed based on key word queries, dates, authors, senders, and e-mail domains.

When searching for keywords, standard Boolean (AND, OR, NOT) and proximity operators (NEAR(n)) can be used, as well as wildcards* that expand keywords to include terms that contain part of a keyword or terms that have alternative spellings.

From what we can tell, Office 365 is still using the FAST search engine and Continuous Crawl which insures that new items are almost instantly searchable.


There are clear e-mail search limitations in Office 365 to be aware of when conducting eDiscovery.

  • OCR.  Because OCR is not being performed by Office 365, image-based files will not

  • be searched (if OCR is added at some point – which is unlikely – check with Microsoft Support on the OCR engine.  Not all OCR engines support the recognition of Chinese, Japanese and other foreign languages).

  • Password-Protected Attachments.  Password-protected e-mail attachments are not searched.

  • Indexing of Special Characters.  Special characters (e.g., those often found in patent cases, such as “, . / – ‘ _ &”) are not indexed, which can result in a higher number of documents in the attorney review stage of the eDiscovery process.


Understanding its limitations, should Office 365 be used for eDiscovery search?  If so, to what extent?  As a testifying eDiscovery expert, I have firsthand experience in cases in which companies performed electronic discovery using the basic features such as those offered by Office 365.  When evidence is missed, the consequences can be very severe.

The key is properly weighing these proportionality factors from the soon-to-be-updated Federal Rules of Civil Procedure (Rule 26(b)(2)(C)(iii):

  • the needs of the case

  • the amount in controversy

  • the parties’ resources

  • the importance of the issues at stake in the action, and

  • the importance of the discovery in resolving the issues.

Consider leveraging the Proportionality Triangle as you consider your eDiscovery process for a given case.


Office 365 offers a significant amount of utility with regards to litigation holds and preservation.  However, it’s e-mail search capabilities have distinct limitations.  Consider these limitations as you determine the shape of your eDiscovery process.

Managing IP Magazine: Discover Business Databases

Managing IP magazine has published an article Jeff Johnson co-authored with Brent Babcock of Knobbe Martens Olson & Bear, “Discover Business Databases” in the July/August 2011 issue. In the article, Babcock and Johnson summarize some of the benefits of pursuing discovery of business databases and offer advice on requesting database discovery.

The article also includes a sidebar in which discusses an interesting case where Quantum found crucial evidence in a 1990’s-era accounting system.

Quantum offers significant discovery experience across a wide range of practice areas, involving FLSA, wage & hour litigation, intellectual property disputes, insurance claims, data breaches and Sarbanes-Oxley compliance.

Our technologies and methods enable us to execute finely-crafted inquiries into the data.  With over 20 years of experience across many relation-a database management systems and programming languages (including C, C++, C#, Java, .NET, VBA and many more), our development team couples database connectivity protocols with ETL (Extract Transform & Load) tools and processes in order to seamlessly extract and transform structured ESI.

Read the article here: