Big Files, Big E-Discovery Cost Savings (Sometimes)

Reducing e-discovery costs involves being able to use various cost reduction strategies depending on the circumstances of each case. One of those strategies is to be aware of the extent to which there are large files being considered for processing in a final review platform. To use an extreme example, if there was one file that accounted for half of the GBs in a case, it would make sense to examine the file before loading it into a final review platform.

The critical benchmark to know is what your hosting provider charges per gigabyte cost to ingest and host a gigabyte for a year. If it doesn’t cost anything, there’s far less impetus to examine file size. This posting uses the $200 per gigabyte benchmark, you’ll obviously want to use your own metrics.

The basic idea is simple: get a listing of files sorted in descending file size and apply the first-year hosting cost to see if it might make sense to use an alternative approach for the largest files. Here’s the file sizes for the largest 20 files in a recent case, referred to as Case 1 in the following content:

As shown in the above table, the client could save over $3,000 in ingestion and hosting costs by examining those 20 files without putting them in an expensive final review platform.

To prepare this posting I examined files from four recent cases, two were relatively small, and two were mid-sized. The following scatter diagram shows that in the four cases examined, over 80% of the total gigabytes in those cases were accounted for by less than 20% of the files. The steeper the cumulative GB percentage curve, the more likely it is that examining the largest files in lieu of sending them to a hosted review platform could be cost effective.

Here are some other metrics from those four cases:

As can be seen, three of the cases wouldn’t benefit significantly from trying to treat the largest files without sending them on to the final review platform. The second and third cases involve such small total expenditures for ingestion and hosting that it’s simply not worth a lot of time trying to review individual files. Those were cased that primarily involved email, and such collections have fewer large files just because of email attachment size limitations. On the other hand, cases that involve large numbers of file share files, like Case 1, are more likely to have large numbers of large files.

As you can see in the table the largest 100 files in Case 1 will cost on average $70 per file to ingest and host for a year. That provides a reasonable incentive to consider whether a lower cost method of treatment other than a final review platform might be feasible, as those 100 files will cost $7,000 for year one. By contrast, Case 4 has a significant number of gigabytes and files but the largest files don’t offer nearly the savings opportunities as Case 1.

No Single Answer
There’s no single best answer on how to treat large files in all cases. As we’ve seen some cases are too small to warrant special processing, and in others the file size profile doesn’t hold much promise of significant savings – there are no low hanging fruit. Sometimes there may simply not be time available to determine alternative processing options, and in some situations, lawyers may feel that the benefits of having a simplistic approach where everything goes in one platform may outweigh the cost savings.

The best advice is to make it a practice to look at the largest-size files in a case and decide whether further analysis is warranted.

Upcoming: Triaging e-discovery by file type.

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Learn about Jeff’s e-discovery philosophy here.