, ,

The Three-Legged E-Discovery Model

Inhouse counsel typically provide for their e-discovery needs by having a partnership with a hosting vendor who provides processing and hosting for major cases and who generally bills based on the volume of gigabytes or number of files processed. However, the best way to manage costs is to add a third role – a consultant who can suggest alternative technologies and processes to limit overall costs.

The tripartite model recognizes the inherent conflict of hosting vendors who want to maximize their revenue by maximizing the volume of data they process and host. Their revenues are their corporate clients’ costs – they’re literally two sides to the same coins. When bonuses and commissions for hosting vendor personnel are based on the volume hosted, the natural inclination is to recommend a “collect-everything-and-sort-it-out-in-the-review-platform” approach. 

The bias towards maximizing the volume of data held by the review platform extends beyond just how much data is put into the system and can impact how that data is stored as well. For example, electronic documents like Word, PowerPoint, and Excel can have embedded graphics. Those embedded graphics can be extracted and stored as separate files despite the fact that in many cases they just clutter the review database and needlessly inflate the storage space being billed by the hosting vendor.

Hosting vendors understandably focus their attention on ways to use their existing system and may not take the time to learn of new technologies or new pricing models that could result in lower costs to clients but lower revenue for them. Vendors have finite resources to devote to market research especially if they’ve already made multi-year commitments to pay hefty licensing fees for their current offerings. Furthermore, some software licenses restrict the ability of vendors to share benchmarking data comparing the effectiveness of the primary technology with that obtained from other sources. 


Case Sensitive Searching. We recently conducted a post-action audit of e-discovery for a client and found that the initial selection of files to be reviewed in the final hosting platform involved a key term that was an acronym that appeared in all caps, e.g., ACT. The hosting vendor dutifully collected such files without either knowing or disclosing that the key term search could have specified case sensitivity which would have omitted gathering many files where the lower-case version of the word (“act”) or mixed-case version of the word (“Act”) that were never responsive.

Analytics/Predictive Coding. Hosting vendors often tout the efficiency of their analytics and predictive coding technology in eliminating clutter. And they’re right, analytics and predictive coding/Technology Assisted Review is very effective. However, that type of technology can be licensed on terms other than paying per gigabyte fees. Concept clustering, social network analysis, and other tools can be used iteratively to cull the non-relevant “noise” documents before the data goes to hosting vendors and without paying volume-based fees.

Having a knowledgeable consultant who is not compensated based on primarily volume processed or hosted can result in the selection of more cost-effective e-discovery solutions.


E-Discovery Issue Coding: Good Idea But…

The idea behind issue coding is a good one: When preparing for depositions or trial or summary judgement, attorneys would be able to pull up the documents that support or refute or maybe just provide good context for specific issues. It’s a way to select key bits of evidence from large amounts of data, much of which is ultimately irrelevant. However, litigation teams ought to assess how consistently the issue codes are being applied before over-relying on them. Assessing consistency may well impact when the issue coding is done, by whom, and at what level of granularity.

Measuring Coding Consistency

Issue coding is not free. If done as part of a responsiveness review, it will slow down the number of documents that can be reviewed per hour. A complex issue code schema my cut throughput in half, meaning it costs as much as the responsiveness review. Companies can use estimates of the different throughput rates to estimate the cost of issue coding. It makes sense to invest a little time determining the value of the coding.

Control Sets. The best way to measure coding consistency of the people you are considering using for coding is to have each of them issue code the same set of documents. There are several ways to analyze the coding results, including using Excel, Access, or possibly a review platform. The most flexible way to analyze results may be to create a simple relational database with separate tables for issue codes, coders, and documents reviewed, something along the following lines:

The Coding Results Table has one row for each code assigned to each document. This will be much easier to use for analysis than a table that has a multiple-value field with all the issue codes assigned to the document in one aggregated field. If you have a multiple-issue code field to work with, you may find it easier to parse it into the suggested layout.

Here’s how to analyze the results:

Coding Intensity – General Sense of Relevancy. At a gross level, how many issues does each coder assign? Some coders will see an issue lurking in every paragraph, others won’t see any unless the document would be entirely dispositive of an issue. There’s no one right answer, the best idea is to have the lead attorney or the principal litigators code the same document set and then use coders who have the same general sense of relevance.

The output of this type of analysis would be a table with columns for coders and total codes assigned:

Coding Congruence – Thinking Alike. Identify which pairs of coders assign the same issue codes to the same document to see how congruent their results are – the extent to which their coding overlaps.

The end result of the congruence analysis would be a table with coders listed on both the column and row headers and the congruence measure for the pair placed in the intersecting cell:

This is how the calculations would be performed:

Congruence of A and B: BOTH/(BOTH + A_NOT_B + B_NOT_A) = 80/(80 + 5 + 15) = 80%

Note that the suggested database structure permits congruence comparisons at a higher level, e.g., people can be regarded as agreeing on high level codes even if the second- or third-level codes are different.

Who Does the Issue Coding

To state the obvious, the coders who think like the lead litigators ought to do the issue coding, or they ought to be used to review issue coding before they’re finalized in the review platform.

Complementary Tools, e.g., Finding Conceptually-Similar Documents

If issue coding was the only tool available to attorneys for depo or trial prep, completeness of coding would be a major concern. Fortunately, there are many complementary tools and ways to expand on the documents initially tagged with issue codes. For example, most systems will be able to identify conceptually-similar documents, or attorneys can identify the people associated with the tagged document and examine other documents from, to, or about the same person for the same general time frame, or search for the key terms used to discuss the issues in the tagged document. All of which is to say that an issue coding system that tags documents that are highly relevant to specific issues will have value even if some relevant documents are not initially tagged.

When Coding is Done

The significance and interpretation of issues change as the case progresses. Highly detailed issue coding is better done closer to trial then when documents are first reviewed. To the extent issue coding is done earlier it can be better to have more general issue coding.


Having multiple levels in issue codes will slow the assigning of issue codes and can be difficult to keep updated as the lawyers’ understanding of the case develops over time. It may be more effective to have higher level issue codes combined with a “hotness” rating, something like:


Issue coding can serve as a useful way to organize the documents that are the most relevant to specific issues. Their usefulness can be maximized by evaluating those who will be doing the coding to ensure consistency of coding.