,

E-Discovery Issue Coding: Good Idea But…

The idea behind issue coding is a good one: When preparing for depositions or trial or summary judgement, attorneys would be able to pull up the documents that support or refute or maybe just provide good context for specific issues. It’s a way to select key bits of evidence from large amounts of data, much of which is ultimately irrelevant. However, litigation teams ought to assess how consistently the issue codes are being applied before over-relying on them. Assessing consistency may well impact when the issue coding is done, by whom, and at what level of granularity.

Measuring Coding Consistency

Issue coding is not free. If done as part of a responsiveness review, it will slow down the number of documents that can be reviewed per hour. A complex issue code schema my cut throughput in half, meaning it costs as much as the responsiveness review. Companies can use estimates of the different throughput rates to estimate the cost of issue coding. It makes sense to invest a little time determining the value of the coding.

Control Sets. The best way to measure coding consistency of the people you are considering using for coding is to have each of them issue code the same set of documents. There are several ways to analyze the coding results, including using Excel, Access, or possibly a review platform. The most flexible way to analyze results may be to create a simple relational database with separate tables for issue codes, coders, and documents reviewed, something along the following lines:

The Coding Results Table has one row for each code assigned to each document. This will be much easier to use for analysis than a table that has a multiple-value field with all the issue codes assigned to the document in one aggregated field. If you have a multiple-issue code field to work with, you may find it easier to parse it into the suggested layout.

Here’s how to analyze the results:

Coding Intensity – General Sense of Relevancy. At a gross level, how many issues does each coder assign? Some coders will see an issue lurking in every paragraph, others won’t see any unless the document would be entirely dispositive of an issue. There’s no one right answer, the best idea is to have the lead attorney or the principal litigators code the same document set and then use coders who have the same general sense of relevance.

The output of this type of analysis would be a table with columns for coders and total codes assigned:

Coding Congruence – Thinking Alike. Identify which pairs of coders assign the same issue codes to the same document to see how congruent their results are – the extent to which their coding overlaps.

The end result of the congruence analysis would be a table with coders listed on both the column and row headers and the congruence measure for the pair placed in the intersecting cell:

This is how the calculations would be performed:

Congruence of A and B: BOTH/(BOTH + A_NOT_B + B_NOT_A) = 80/(80 + 5 + 15) = 80%

Note that the suggested database structure permits congruence comparisons at a higher level, e.g., people can be regarded as agreeing on high level codes even if the second- or third-level codes are different.

Who Does the Issue Coding

To state the obvious, the coders who think like the lead litigators ought to do the issue coding, or they ought to be used to review issue coding before they’re finalized in the review platform.

Complementary Tools, e.g., Finding Conceptually-Similar Documents

If issue coding was the only tool available to attorneys for depo or trial prep, completeness of coding would be a major concern. Fortunately, there are many complementary tools and ways to expand on the documents initially tagged with issue codes. For example, most systems will be able to identify conceptually-similar documents, or attorneys can identify the people associated with the tagged document and examine other documents from, to, or about the same person for the same general time frame, or search for the key terms used to discuss the issues in the tagged document. All of which is to say that an issue coding system that tags documents that are highly relevant to specific issues will have value even if some relevant documents are not initially tagged.

When Coding is Done

The significance and interpretation of issues change as the case progresses. Highly detailed issue coding is better done closer to trial then when documents are first reviewed. To the extent issue coding is done earlier it can be better to have more general issue coding.

Granularity

Having multiple levels in issue codes will slow the assigning of issue codes and can be difficult to keep updated as the lawyers’ understanding of the case develops over time. It may be more effective to have higher level issue codes combined with a “hotness” rating, something like:

Conclusion

Issue coding can serve as a useful way to organize the documents that are the most relevant to specific issues. Their usefulness can be maximized by evaluating those who will be doing the coding to ensure consistency of coding.

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Learn about Jeff’s e-discovery philosophy here.

, ,

E-Discovery Process Improvement: The After-Action Audit

Process improvement involves an ongoing effort to identify what’s working well and identify what could be improved. In e-discovery, it’s sometimes painfully obvious when things didn’t work well, e.g., a production deadline is missed or sensitive data is produced. However, it’s not always obvious what could be improved – it’s hard to identify potential improvements if results are about what people thought was achievable.

After-action audits can be eye-opening in identifying ways to improve the e-discovery process. However, the term “improve” is rather broad; more specific goals will provide better guidance for the audit. Jeff Carr, long-time legal cost expert, recommends SMART goals – those that are Specific, Measurable, Achievable, Realistic, and Timely, e.g.:

  • Lower outside counsel document review fees by 20%
  • Lower hosting fees for e-discovery review platform vendors by 30%.
  • Identify cost-effective ways to get early looks at potential discovery from the very start of potential litigation.

One process I find useful is to select a case representative in complexity and scope to those ordinarily encountered by the client, and reprocess the same documents using alternative tool sets. Using actual case data has several advantages:

  • Proving scalability of alternative tools sets. Some tools look nice on small select demo data sets (does “Enron” sound familiar?) but don’t scale well for large collections.
  • Identifying “gotchas” in alternative tools. There can be idiosyncrasies in data sets that cause problems in some tools. Nothing identifies these problems like running actual client data.
  • Validating original technology. Search and analytics tools that performs similar functions may not produce the same results, e.g., some full text search software may have problems indexing specific document types. The audit provides a way to potentially identify weaknesses.

In the ideal world, there would be production notes detailing the tools that were used to achieve the original volume reduction, and the decisions that were made, and there would be bills from attorneys, review providers, hosting providers, and software providers. All that detail provides a baseline for comparison.

Audit Deliverables

The audit report should cover:

  • Alternative Tools. What tools could have provided the same functionality in terms of eliminating irrelevant content and identifying relevant content, but at a lower cost? For example, social network analysis, key term logic testing, concept clustering, visual similarity, and other functions are available in a variety of software packages that can be provided without per-gigabyte processing fees.
  • Dollar Impact of Alternative Tools. Culling irrelevant content early in the process saves considerable money downstream, e.g., reduced hosting fees, and reduced attorney review time. The report can estimate those savings.
  • Recommended Training. What training should be provided to either make better use of existing tools or to use new tools?

Cost

The direct costs of the after-action audit needn’t be very large when the tools used for auditing are provided on a flat- or no-fee basis, i.e., not charged on a per-GB, per-user, or per-search basis. Most audits can be performed using low-cost cloud storage or existing consultant infrastructure.

Time

Audits needn’t take a long time to complete. Large savings are usually quickly obvious, and useful, actionable data can be available within about a month.

Further Reading:

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Learn about Jeff’s e-discovery philosophy here.

,

Eating at the E-Discovery Diner: Buffet or à la Carte?

There is a difference of opinion about the best way for corporations to buy e-discovery services. One view could be characterized as the “single-throat-to-choke” approach which focuses on accountability – the corporation wants to have a single party take complete responsibility for everything from collection through production so there’s no question who’s at fault if anything goes wrong. The other view is a more á la carte approach where the corporation buys services as needed from different providers.

My view is that the single-throat approach results in overpaying for e-discovery services. Corporations can obtain more cost-effective results by having consultants who specialize in using the most appropriate tools for the collection and initial culling and contract separately for the final review for the reduced data set. The corporation can retain full accountability by clearly delineated responsibilities and hand-offs between the two providers.

The collection-initial culling vendor is responsible for gathering initial content and applying early analytics and other tools to the content while keeping detailed logs of what was done, what tools were used, and what culling decisions were made.

In the shared responsibility model, the client specifies the format of files that will be handed over to the final review vendor as well as the method and date of delivery. The final review vendor is then tasked with documenting the steps taken to further cull the collection as well as the delivery date and method of production, including generating privilege logs.

The single vendor approach is like eating every meal and taking every coffee break at an all-you-can-eat buffet. You overpay and consume too much of the wrong things. As a matter of fact, there are many early analytics tools that give early insight into potentially-responsive documents while avoiding the per GB cost model used by most integrated approach vendors. Every GB that is screened out before going to final review can save the corporate client hundreds of dollars per year.

Final review vendors typically have large staffs for help desk, technical support, consulting, and sales personnel, and have major investments in licenses, processing infrastructure, advertising, trade shows, and office space. All those expenses have to be covered to stay in business. By contrast, discovery boutiques specializing in collection and initial culling can be nimbler and offer different pricing models for delivering a comparable or an extended range of early discovery analytics tools.

Note that time and data security are major considerations when deciding what approach to take when contracting for e-discovery services. Many early analytics tools can be deployed and yield results in the time it can take to setup a final review platform, administer passwords, conduct training, and begin to load the initial data. Furthermore, from a data security standpoint, it is much better to screen out as much content as possible before putting content on a final review platform where dozens of people will have access to some or all of it.

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Learn about Jeff’s e-discovery philosophy here.

Five Low-Cost, Attorney-Friendly Ways to Cull Email in E-Discovery

E-discovery is expensive with email and its attachments typically being the most prevalent data types. Here are five low-cost, low-tech, lawyer-friendly tools that can be used to cull emails prior to going to a final review platform. Final review platforms, while powerful, are expensive and, compared to these five low-cost tools, are time-consuming to load and administer. In addition to achieving the immediate goal of culling unresponsive content, this set of tools also familiarizes lawyers with the collection and makes substantial progress on finalizing the key term list that will be used for final production and shared with opposing counsel.

5 Ways to Cull Email in E-Discovery

Five Tools to Cull Email in E-Discovery

Here’s how to use these five tools in the e-Discovery process: Once potential custodians have been identified, collect their data and identify an initial key terms list. Before sending content off for final review, search for the key terms in the collected content and provide electronic reports of the results for attorneys to review.  Each of these five reports takes a different look at the result set and those different looks provide a perspective to the attorneys reviewing the reports. In our approach, attorneys can sort the reports several different ways (e.g., date order, by sender, or by topic) and flag emails that can be safely excluded.

The process is highly iterative, as the attorneys gain more understanding of the documents and the terms the searching and report viewing is easily repeated to refine results.

Here are the reports which are run after the emails are deduped:

  1. Low Reply Rate Emails. These are emails sent that had a very low number of replies or no replies. Example of emails that fit in this category are:
  • Internal e-mails from IT
  • Emails from automated senders
  • Spam
  • Mass marketing
  1. Large Distribution Emails. Emails that are sent to large numbers of recipients tend to be distribution lists for standardized reports or other recurring content. Identifying which of those can be eliminated can remove substantial volumes from consideration.
  2. Visually-Similar Email Payloads. When people repeatedly send attachments containing the same types of information to other people, those attachments tend to look alike, even if the key terms in them differ from attachment to attachment. Grouping visually-similar attachments and then tying the groupings back to the emails that attached them can reveal subsets of documents that are either clearly nonresponsive or responsive.
  3. Textually-Similar Emails. Grouping emails based on their textual similarities is another way to pull together items where decisions can often be made in bulk to include or exclude items from production.
  4. Key Terms List. Attorneys are provided with a word frequency list of the words occurring in the search results as a way of familiarizing them with terms they may not have considered using for searches. The word frequency list can also suggest whether terms would be useful in identifying subsets of the documents. For example, a word that occurs in all the documents won’t help in selecting a subset of the documents.

The advantages of utilizing these 5 Tools in the e-Discovery process:

  1. Low Cost. The searching and reporting can be conducted without incurring per gigabyte or per user fees. Any items excluded will not incur the large initial ingestion and monthly hosting fees for content placed in the final review platform.
  2. Low Tech. Attorneys are accustomed to reviewing reports and no special training is required to browse the reports. There are no passwords or licenses to setup or administer.
  3. Quick. The tools that generate these reports can process large volumes of data in a short time. Lawyers can be reviewing results on a TB of data within two days.
  4. Highly Iterative. As lawyers gain insight into the content and how key terms are distributed across documents and custodians, they can refine the key term list and search logic to exclude plainly irrelevant content and identify responsive content.
  5. Complementary to Other Tools. The report toolset can be used to identify where further insight would be obtained by using other tools on the collection or subsets of the collection, e.g., to perform a concept clustering analysis or find linguistically-similar content.

Using tools like these described, the volume of emails sent for final review can often be reduced by well in excess of 90% before going to the final review platform.

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Learn about Jeff’s e-discovery philosophy here.

Why Are In-House Early Case Analytics Important?

In-house early case analytics are important to the corporation because they have the potential to significantly impact the total cost of litigation.  According to RAND, attorney review typically accounts for about 73 percent of all eDiscovery production costs.  The simple rule of thumb is this:  the less documents you send to outside counsel, the more you will save on litigation costs.

 

I’ve built an online spreadsheet in Office 365 that calculates the savings that that can be achieved via in-house early case analytics.  It compares Quantum’s In-House Early Case Analytics Model to the Typical eDiscovery Model.  Feel free to modify it to suit your needs.  Here is a snapshot of the spreadsheet:

Early Case Analytics Model

 

Every company will have slight differences in their discovery workflow, so I am glad to spend some time with you and the spreadsheet to see how your company would benefit from in-house early case analytics.

For a closer look at In-House Early Case Analytics, see our explanation article here.

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Learn about Jeff’s e-discovery philosophy here.

,

What Are In-House Early Case Analytics?

In-house early analytics are discovery intelligence gathering and reporting mechanisms that help in-house counsel and outside counsel understand a corpus of potentially-relevant documents and e-mail.

In-house early case analytics gives counsel the ability to make well-informed decisions about what documents and e-mail are clearly non-relevant so that these files can be removed prior to transfer to outside counsel for traditional review.

Said more specifically, the purpose of in-house early analytics is to educate and inform counsel as to the nature, scope and potential size of the document request.  In many legal cases, outside counsel is oblivious as to the size of burden a discovery request places upon a company.  In-house early analytics brings transparency to outside counsel so that they can refine the request.  Meanwhile, in-house early case analytics informs managing counsel as to the actual costs of discovery – prior to ESI being sent out the door.

In-house early analytics come to counsel in the form of informational reports and visualizations, three of which I list here:

  • Key term hit reports by custodian (see example below)

    Custodian Analysis by Term

  • Visual charts and graphs (concept maps, conversation clusters, clusters of similar documents, etc)

    • Concept cluster maps (visualization that clusters similar documents together)Concept Cluster Maps
    • Conversation cluster maps (a visualization that shows the e-mail communications)Conversion Cluster Maps
    • Interactive screen share sessions where outside counsel is able to view a file share firsthand.

 

In my personal experience, when outside counsel is educated and informed via early in-house analytics, they will often then have sufficient information necessary to refine and further perfect their key terms list.  This refinement will often have a significant impact on the total number of documents that end up in traditional attorney review.

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Learn about Jeff’s e-discovery philosophy here.

Using Analytics for Pre-Review Data Reduction

According to RAND, Review typically accounts for about 73 percent of all eDiscovery production costs.

Technology has changed the way we work and live in virtually every other aspect of our lives.  So how can technology help reduce discovery production costs?

Quantum’s initial array of technologies work alongside Office 365 and other mail archiving environments, bringing fast indexing, complex iterative search capability and reporting to bear upon the reduction effort.   We are also able to (very inexpensively) pass reduced copies of original data along to the review stage of the eDiscovery process (see blog post: How Early Analytics Enable You to Count the Cost).

But even after applying traditional metadata filters and key search terms in an iterative fashion (which in our experience reduces the data by an average of 93-94%), a substantial number of non-relevant documents always seems to remain.

Pre-Review Data Reduction

The warning here is that once documents have been put into a review platform, eDiscovery costs immediately escalate.  Here are some of  the fees that kick in right away:

 – Review vendor hosting fees

 – Attorney review fees

 – Premium data storage fees

Purveyors of rigid, assembly line-style approaches to eDiscovery (that do not apply the necessary technical expertise need to defensibly reduce the data further before putting the documents into a review platform) will eventually find themselves at a competitive disadvantage at the corporate level, because corporations typically operate on fixed budgets and are more likely to form relationships with vendors who can help them solve the costly problem of sending tens of thousands of non-relevant documents out for attorney review.  Defensible and objective culling out of non-relevant documents before moving them to a review platform further smooths out the spikes in discovery costs for budget-driven corporations.

The technical challenge is applying objective, defensible methods so that the remaining non-relevant documents are significantly reduced before the documents are put into a review platform.

While there is no “silver bullet” technology that can achieve this in each and every case, we select from an array of technologies that can perform the following objective tasks quickly – before  the documents are put into a review platform (in conjunction with guidance from Counsel, of course):

 – Identify non-relevant clusters of documents

 – Identify non-relevant date and non-relevant time periods

 – Identify non-relevant senders

 – Identify non-relevant domains

 – Identify e-mail with visually-similar attachments

 – Identify e-mail that are To and/or From specific custodians

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Learn about Jeff’s e-discovery philosophy here.

,

Benefits of Conducting Business Database Discovery

Managing IP magazine has published an article Jeff Johnson co-authored with Brent Babcock of Knobbe Martens Olson & Bear, “Discover Business Databases.” In the article, Babcock and Johnson summarize some of the benefits of pursuing discovery of business databases and offer advice on requesting database discovery.

The article also includes a sidebar in which discusses an interesting case where Quantum found crucial evidence in a 1990’s-era accounting system:

The traditional focus of e-discovery has been to manage unstructured data, such as electronic documents, e-mails, presentations, and spreadsheets. Today, courts and attorneys increasingly recognize that business databases – the informational heart of many organizations – represent a valuable but often untapped discovery resource. In fact, business databases may be the only place where certain information may exist. Obtaining the most meaningful data from databases requires specialized knowledge and tools. However, early assessment of the data potentially contained in such databases can help focus discovery efforts and thereby reduce litigation costs.  IP owners should be aware of the advantages of obtaining discovery of business databases compared to conducting traditional e-discovery of unstructured records.

Quantum offers significant discovery experience across a wide range of practice areas, involving FLSA, wage & hour litigation, intellectual property disputes, insurance claims, data breaches and Sarbanes-Oxley compliance.

Our technologies and methods enable us to execute finely-crafted inquiries into the data.  With over 20 years of experience across many relation-a database management systems and programming languages (including C, C++, C#, Java, .NET, VBA and many more), our development team couples database connectivity protocols with ETL (Extract Transform & Load) tools and processes in order to seamlessly extract and transform structured ESI.

Read the article here:

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Learn about Jeff’s e-discovery philosophy here.

Office 365 eDiscovery E-Mail Search Limitations

Office 365 eDiscovery EmailOffice 365 is enjoying continued success – driven partly by the popularity of Microsoft Outlook.  Outlook, according to the Radicati Group, is the most popular business email client on the market, with a 60% market share just a couple of years ago. For years to come, Office 365 will be used by companies to conduct the electronic discovery of e-mail.

Given the significance of Office 365, what are Office 365’s e-mail search limitations from a discovery perspective?  What do users need to be aware of when conducting litigation, internal investigations and regulatory compliance?

Overview
The eDiscovery cases page in the Office 365 Compliance Center is where cases are accessed and managed.  Here, users can identify relevant e-mail and place litigation holds on sources such as SharePoint and Exchange (Exchange is the Office 356 mail store).  Holds can be placed based on key word queries, dates, authors, senders, and e-mail domains.

When searching for keywords, standard Boolean (AND, OR, NOT) and proximity operators (NEAR(n)) can be used, as well as wildcards that expand keywords to include terms that contain part of a keyword or terms that have alternative spellings.

From what we can tell, Office 365 is still using the FAST search engine and Continuous Crawl which insures that new items are almost instantly searchable.

 

Limitations
There are clear e-mail search limitations in Office 365 to be aware of when conducting eDiscovery.

  • OCR.  Because OCR is not being performed by Office 365, image-based files will not be searched (if OCR is added at some point – which is unlikely – check with Microsoft Support on the OCR engine.  Not all OCR engines support the recognition of Chinese, Japanese and other foreign languages).

  • Password-Protected Attachments.  Password-protected e-mail attachments are not searched.

  • Indexing of Special Characters.  Special characters (e.g., those often found in patent cases, such as “, . / – ‘ _ &”) are not indexed, which can result in a higher number of documents in the attorney review stage of the eDiscovery process.

 

Proportionality
Understanding its limitations, should Office 365 be used for eDiscovery search?  If so, to what extent?  As a testifying eDiscovery expert, I have firsthand experience in cases in which companies performed electronic discovery using the basic features such as those offered by Office 365.  When evidence is missed, the consequences can be very severe.

The key is properly weighing these proportionality factors from the soon-to-be-updated Federal Rules of Civil Procedure (Rule 26(b)(2)(C)(iii):

  • the needs of the case

  • the amount in controversy

  • the parties’ resources

  • the importance of the issues at stake in the action, and

  • the importance of the discovery in resolving the issues.

Consider leveraging the Proportionality Triangle (read more about the Proportionality Triangle here) as you consider your eDiscovery process for a given case.

 

Conclusion
Office 365 offers a significant amount of utility with regards to litigation holds and preservation.  However, it’s e-mail search capabilities have distinct limitations.  Consider these limitations as you determine the shape of your eDiscovery process.

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Learn about Jeff’s e-discovery philosophy here.