E-Discovery Issue Coding: Good Idea But…

The idea behind issue coding is a good one: When preparing for depositions or trial or summary judgement, attorneys would be able to pull up the documents that support or refute or maybe just provide good context for specific issues. It’s a way to select key bits of evidence from large amounts of data, much of which is ultimately irrelevant. However, litigation teams ought to assess how consistently the issue codes are being applied before over-relying on them. Assessing consistency may well impact when the issue coding is done, by whom, and at what level of granularity.

Measuring Coding Consistency

Issue coding is not free. If done as part of a responsiveness review, it will slow down the number of documents that can be reviewed per hour. A complex issue code schema my cut throughput in half, meaning it costs as much as the responsiveness review. Companies can use estimates of the different throughput rates to estimate the cost of issue coding. It makes sense to invest a little time determining the value of the coding.

Control Sets. The best way to measure coding consistency of the people you are considering using for coding is to have each of them issue code the same set of documents. There are several ways to analyze the coding results, including using Excel, Access, or possibly a review platform. The most flexible way to analyze results may be to create a simple relational database with separate tables for issue codes, coders, and documents reviewed, something along the following lines:

The Coding Results Table has one row for each code assigned to each document. This will be much easier to use for analysis than a table that has a multiple-value field with all the issue codes assigned to the document in one aggregated field. If you have a multiple-issue code field to work with, you may find it easier to parse it into the suggested layout.

Here’s how to analyze the results:

Coding Intensity – General Sense of Relevancy. At a gross level, how many issues does each coder assign? Some coders will see an issue lurking in every paragraph, others won’t see any unless the document would be entirely dispositive of an issue. There’s no one right answer, the best idea is to have the lead attorney or the principal litigators code the same document set and then use coders who have the same general sense of relevance.

The output of this type of analysis would be a table with columns for coders and total codes assigned:

Coding Congruence – Thinking Alike. Identify which pairs of coders assign the same issue codes to the same document to see how congruent their results are – the extent to which their coding overlaps.

The end result of the congruence analysis would be a table with coders listed on both the column and row headers and the congruence measure for the pair placed in the intersecting cell:

This is how the calculations would be performed:

Congruence of A and B: BOTH/(BOTH + A_NOT_B + B_NOT_A) = 80/(80 + 5 + 15) = 80%

Note that the suggested database structure permits congruence comparisons at a higher level, e.g., people can be regarded as agreeing on high level codes even if the second- or third-level codes are different.

Who Does the Issue Coding

To state the obvious, the coders who think like the lead litigators ought to do the issue coding, or they ought to be used to review issue coding before they’re finalized in the review platform.

Complementary Tools, e.g., Finding Conceptually-Similar Documents

If issue coding was the only tool available to attorneys for depo or trial prep, completeness of coding would be a major concern. Fortunately, there are many complementary tools and ways to expand on the documents initially tagged with issue codes. For example, most systems will be able to identify conceptually-similar documents, or attorneys can identify the people associated with the tagged document and examine other documents from, to, or about the same person for the same general time frame, or search for the key terms used to discuss the issues in the tagged document. All of which is to say that an issue coding system that tags documents that are highly relevant to specific issues will have value even if some relevant documents are not initially tagged.

When Coding is Done

The significance and interpretation of issues change as the case progresses. Highly detailed issue coding is better done closer to trial then when documents are first reviewed. To the extent issue coding is done earlier it can be better to have more general issue coding.


Having multiple levels in issue codes will slow the assigning of issue codes and can be difficult to keep updated as the lawyers’ understanding of the case develops over time. It may be more effective to have higher level issue codes combined with a “hotness” rating, something like:


Issue coding can serve as a useful way to organize the documents that are the most relevant to specific issues. Their usefulness can be maximized by evaluating those who will be doing the coding to ensure consistency of coding.

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Learn about Jeff’s e-discovery philosophy here.