Five Low-Cost, Attorney-Friendly Ways to Cull Email in E-Discovery

E-discovery is expensive with email and its attachments typically being the most prevalent data types. Here are five low-cost, low-tech, lawyer-friendly tools that can be used to cull emails prior to going to a final review platform. Final review platforms, while powerful, are expensive and, compared to these five low-cost tools, are time-consuming to load and administer. In addition to achieving the immediate goal of culling unresponsive content, this set of tools also familiarizes lawyers with the collection and makes substantial progress on finalizing the key term list that will be used for final production and shared with opposing counsel.

5 Ways to Cull Email in E-Discovery

Five Tools to Cull Email in E-Discovery

Here’s how to use these five tools in the e-Discovery process: Once potential custodians have been identified, collect their data and identify an initial key terms list. Before sending content off for final review, search for the key terms in the collected content and provide electronic reports of the results for attorneys to review.  Each of these five reports takes a different look at the result set and those different looks provide a perspective to the attorneys reviewing the reports. In our approach, attorneys can sort the reports several different ways (e.g., date order, by sender, or by topic) and flag emails that can be safely excluded.

The process is highly iterative, as the attorneys gain more understanding of the documents and the terms the searching and report viewing is easily repeated to refine results.

Here are the reports which are run after the emails are deduped:

  1. Low Reply Rate Emails. These are emails sent that had a very low number of replies or no replies. Example of emails that fit in this category are:
  • Internal e-mails from IT
  • Emails from automated senders
  • Spam
  • Mass marketing
  1. Large Distribution Emails. Emails that are sent to large numbers of recipients tend to be distribution lists for standardized reports or other recurring content. Identifying which of those can be eliminated can remove substantial volumes from consideration.
  2. Visually-Similar Email Payloads. When people repeatedly send attachments containing the same types of information to other people, those attachments tend to look alike, even if the key terms in them differ from attachment to attachment. Grouping visually-similar attachments and then tying the groupings back to the emails that attached them can reveal subsets of documents that are either clearly nonresponsive or responsive.
  3. Textually-Similar Emails. Grouping emails based on their textual similarities is another way to pull together items where decisions can often be made in bulk to include or exclude items from production.
  4. Key Terms List. Attorneys are provided with a word frequency list of the words occurring in the search results as a way of familiarizing them with terms they may not have considered using for searches. The word frequency list can also suggest whether terms would be useful in identifying subsets of the documents. For example, a word that occurs in all the documents won’t help in selecting a subset of the documents.

The advantages of utilizing these 5 Tools in the e-Discovery process:

  1. Low Cost. The searching and reporting can be conducted without incurring per gigabyte or per user fees. Any items excluded will not incur the large initial ingestion and monthly hosting fees for content placed in the final review platform.
  2. Low Tech. Attorneys are accustomed to reviewing reports and no special training is required to browse the reports. There are no passwords or licenses to setup or administer.
  3. Quick. The tools that generate these reports can process large volumes of data in a short time. Lawyers can be reviewing results on a TB of data within two days.
  4. Highly Iterative. As lawyers gain insight into the content and how key terms are distributed across documents and custodians, they can refine the key term list and search logic to exclude plainly irrelevant content and identify responsive content.
  5. Complementary to Other Tools. The report toolset can be used to identify where further insight would be obtained by using other tools on the collection or subsets of the collection, e.g., to perform a concept clustering analysis or find linguistically-similar content.

Using tools like these described, the volume of emails sent for final review can often be reduced by well in excess of 90% before going to the final review platform.

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Read his full bio here.

Why Are In-House Early Case Analytics Important?

In-house early case analytics are important to the corporation because they have the potential to significantly impact the total cost of litigation.  According to RAND, attorney review typically accounts for about 73 percent of all eDiscovery production costs.  The simple rule of thumb is this:  the less documents you send to outside counsel, the more you will save on litigation costs.

 

I’ve built an online spreadsheet in Office 365 that calculates the savings that that can be achieved via in-house early case analytics.  It compares Quantum’s In-House Early Case Analytics Model to the Typical eDiscovery Model.  Feel free to modify it to suit your needs.  Here is a snapshot of the spreadsheet:

Early Case Analytics Model

 

Every company will have slight differences in their discovery workflow, so I am glad to spend some time with you and the spreadsheet to see how your company would benefit from in-house early case analytics.

For a closer look at In-House Early Case Analytics, see our explanation article here.

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Read his full bio here.

,

What Are In-House Early Case Analytics?

In-house early analytics are discovery intelligence gathering and reporting mechanisms that help in-house counsel and outside counsel understand a corpus of potentially-relevant documents and e-mail.

In-house early case analytics gives counsel the ability to make well-informed decisions about what documents and e-mail are clearly non-relevant so that these files can be removed prior to transfer to outside counsel for traditional review.

Said more specifically, the purpose of in-house early analytics is to educate and inform counsel as to the nature, scope and potential size of the document request.  In many legal cases, outside counsel is oblivious as to the size of burden a discovery request places upon a company.  In-house early analytics brings transparency to outside counsel so that they can refine the request.  Meanwhile, in-house early case analytics informs managing counsel as to the actual costs of discovery – prior to ESI being sent out the door.

In-house early analytics come to counsel in the form of informational reports and visualizations, three of which I list here:

  • Key term hit reports by custodian (see example below)

    Custodian Analysis by Term

  • Visual charts and graphs (concept maps, conversation clusters, clusters of similar documents, etc)

    • Concept cluster maps (visualization that clusters similar documents together)Concept Cluster Maps
    • Conversation cluster maps (a visualization that shows the e-mail communications)Conversion Cluster Maps
    • Interactive screen share sessions where outside counsel is able to view a file share firsthand.

 

In my personal experience, when outside counsel is educated and informed via early in-house analytics, they will often then have sufficient information necessary to refine and further perfect their key terms list.  This refinement will often have a significant impact on the total number of documents that end up in traditional attorney review.

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Read his full bio here.

Using Analytics for Pre-Review Data Reduction

According to RAND, Review typically accounts for about 73 percent of all eDiscovery production costs.

Technology has changed the way we work and live in virtually every other aspect of our lives.  So how can technology help reduce discovery production costs?

Quantum’s initial array of technologies work alongside Office 365 and other mail archiving environments, bringing fast indexing, complex iterative search capability and reporting to bear upon the reduction effort.   We are also able to (very inexpensively) pass reduced copies of original data along to the review stage of the eDiscovery process (see blog post: How Early Analytics Enable You to Count the Cost).

But even after applying traditional metadata filters and key search terms in an iterative fashion (which in our experience reduces the data by an average of 93-94%), a substantial number of non-relevant documents always seems to remain.

Pre-Review Data Reduction

The warning here is that once documents have been put into a review platform, eDiscovery costs immediately escalate.  Here are some of  the fees that kick in right away:

 – Review vendor hosting fees

 – Attorney review fees

 – Premium data storage fees

Purveyors of rigid, assembly line-style approaches to eDiscovery (that do not apply the necessary technical expertise need to defensibly reduce the data further before putting the documents into a review platform) will eventually find themselves at a competitive disadvantage at the corporate level, because corporations typically operate on fixed budgets and are more likely to form relationships with vendors who can help them solve the costly problem of sending tens of thousands of non-relevant documents out for attorney review.  Defensible and objective culling out of non-relevant documents before moving them to a review platform further smooths out the spikes in discovery costs for budget-driven corporations.

The technical challenge is applying objective, defensible methods so that the remaining non-relevant documents are significantly reduced before the documents are put into a review platform.

While there is no “silver bullet” technology that can achieve this in each and every case, we select from an array of technologies that can perform the following objective tasks quickly – before  the documents are put into a review platform (in conjunction with guidance from Counsel, of course):

 – Identify non-relevant clusters of documents

 – Identify non-relevant date and non-relevant time periods

 – Identify non-relevant senders

 – Identify non-relevant domains

 – Identify e-mail with visually-similar attachments

 – Identify e-mail that are To and/or From specific custodians

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Read his full bio here.

,

Benefits of Conducting Business Database Discovery

Managing IP magazine has published an article Jeff Johnson co-authored with Brent Babcock of Knobbe Martens Olson & Bear, “Discover Business Databases.” In the article, Babcock and Johnson summarize some of the benefits of pursuing discovery of business databases and offer advice on requesting database discovery.

The article also includes a sidebar in which discusses an interesting case where Quantum found crucial evidence in a 1990’s-era accounting system:

The traditional focus of e-discovery has been to manage unstructured data, such as electronic documents, e-mails, presentations, and spreadsheets. Today, courts and attorneys increasingly recognize that business databases – the informational heart of many organizations – represent a valuable but often untapped discovery resource. In fact, business databases may be the only place where certain information may exist. Obtaining the most meaningful data from databases requires specialized knowledge and tools. However, early assessment of the data potentially contained in such databases can help focus discovery efforts and thereby reduce litigation costs.  IP owners should be aware of the advantages of obtaining discovery of business databases compared to conducting traditional e-discovery of unstructured records.

Quantum offers significant discovery experience across a wide range of practice areas, involving FLSA, wage & hour litigation, intellectual property disputes, insurance claims, data breaches and Sarbanes-Oxley compliance.

Our technologies and methods enable us to execute finely-crafted inquiries into the data.  With over 20 years of experience across many relation-a database management systems and programming languages (including C, C++, C#, Java, .NET, VBA and many more), our development team couples database connectivity protocols with ETL (Extract Transform & Load) tools and processes in order to seamlessly extract and transform structured ESI.

Read the article here:

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Read his full bio here.

Office 365 eDiscovery E-Mail Search Limitations

Office 365 eDiscovery EmailOffice 365 is enjoying continued success – driven partly by the popularity of Microsoft Outlook.  Outlook, according to the Radicati Group, is the most popular business email client on the market, with a 60% market share just a couple of years ago. For years to come, Office 365 will be used by companies to conduct the electronic discovery of e-mail.

Given the significance of Office 365, what are Office 365’s e-mail search limitations from a discovery perspective?  What do users need to be aware of when conducting litigation, internal investigations and regulatory compliance?

Overview
The eDiscovery cases page in the Office 365 Compliance Center is where cases are accessed and managed.  Here, users can identify relevant e-mail and place litigation holds on sources such as SharePoint and Exchange (Exchange is the Office 356 mail store).  Holds can be placed based on key word queries, dates, authors, senders, and e-mail domains.

When searching for keywords, standard Boolean (AND, OR, NOT) and proximity operators (NEAR(n)) can be used, as well as wildcards that expand keywords to include terms that contain part of a keyword or terms that have alternative spellings.

From what we can tell, Office 365 is still using the FAST search engine and Continuous Crawl which insures that new items are almost instantly searchable.

 

Limitations
There are clear e-mail search limitations in Office 365 to be aware of when conducting eDiscovery.

  • OCR.  Because OCR is not being performed by Office 365, image-based files will not be searched (if OCR is added at some point – which is unlikely – check with Microsoft Support on the OCR engine.  Not all OCR engines support the recognition of Chinese, Japanese and other foreign languages).

  • Password-Protected Attachments.  Password-protected e-mail attachments are not searched.

  • Indexing of Special Characters.  Special characters (e.g., those often found in patent cases, such as “, . / – ‘ _ &”) are not indexed, which can result in a higher number of documents in the attorney review stage of the eDiscovery process.

 

Proportionality
Understanding its limitations, should Office 365 be used for eDiscovery search?  If so, to what extent?  As a testifying eDiscovery expert, I have firsthand experience in cases in which companies performed electronic discovery using the basic features such as those offered by Office 365.  When evidence is missed, the consequences can be very severe.

The key is properly weighing these proportionality factors from the soon-to-be-updated Federal Rules of Civil Procedure (Rule 26(b)(2)(C)(iii):

  • the needs of the case

  • the amount in controversy

  • the parties’ resources

  • the importance of the issues at stake in the action, and

  • the importance of the discovery in resolving the issues.

Consider leveraging the Proportionality Triangle (read more about the Proportionality Triangle here) as you consider your eDiscovery process for a given case.

 

Conclusion
Office 365 offers a significant amount of utility with regards to litigation holds and preservation.  However, it’s e-mail search capabilities have distinct limitations.  Consider these limitations as you determine the shape of your eDiscovery process.

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Read his full bio here.