Unpacking Embedded Objects Can Create Clutter, Inflate E-Discovery Costs

Decisions on processing embedded objects in e-discovery can increase clutter and costs, sometimes in unexpected ways, e.g., inflating per-document pricing arrangements. The best approach in any given case is usually to make informed decisions by examining some of those embedded objects before they are loaded into a final review platform.

Embedded objects are of course quite common. Anytime someone inserts a graphic file or copies and pastes a picture into a Word document, they’ve embedded an object. Same thing happens when someone makes a graph in Excel and inserts it in a PowerPoint presentation. When files containing embedded objects are placed in a review platform, attorneys can specify whether to extract those objects and make separate database records for them, or to just examine them as they are displayed in the container files. This posting suggests how to treat embedded graphics and MS Office objects like charts, graphs, and tables.

Embedded Graphics – Don’t Make Separate Review Records

People often insert graphics from PNG, JPG, or GIF files in documents or presentations to illustrate points or for aesthetic reasons. If separate records are made in the review database for each embedded object, the number of items being reviewed can be greatly inflated. This clutters the review and, by diverting attention from more significant items, can lead to slower, poorer quality review. And while there’s a cost to having individual records for each embedded graphic, there is no significant benefit to viewing them as separate files as opposed to viewing them as displayed in` the container documents.

To show how significant this can be, I reviewed eight projects that were largely email and attachments to identify the number of graphics embedded in those files. Here are the results:

Clutter. As you can see, creating separate review items for embedded graphics would add on average about 28% to the number of free-standing or unembedded files being considered, with the percentage varying between about 5 to 90%. Some projects had tens of thousands of embedded graphics. As noted above, for most cases, those graphics files are just clutter, reviewers don’t see anything more in the independently-presented embedded files than they would when viewing the graphics in the documents themselves. 

Look before Loading. If there is some question as to whether embedded objects ought to be broken out as separate review objects, the best suggestion is to look at them prior to loading them in the review platform. Several forensics and e-discovery tools permit users to identify and examine embedded objects. Reviewing at least some of them prior to loading the content into a review platform will enable making informed decisions about what is appropriate in any given case.

Defining “Document” for Alternative Fee Arrangements for E-Discovery Review. Traditionally, law firms billed by the hour for document review work. This arguably provides reviewers an incentive to be inefficient, so some corporations seek alternative fee arrangements. One alternative approach is per-document pricing for review.  Corporations using per-document review pricing need to be sure that each embedded graphic is not counted as a “document” for billing purposes. If embedded graphics are unpacked as separate objects, does each unpacked graphic count as a “document”? If so, there can be some MS Office documents or emails that cost many times what might have been expected. For example, a PowerPoint with 15 graphics can become 16 separate review database records, and the client would not ordinarily expect each of those to be billed as a “document.”

OCR. Optical character recognition will usually not convert embedded graphics into searchable text if there is any searchable text elsewhere in the document. One potential advantage of extracting embedded graphics into separate database objects is having OCR applied to those files. However, other e-discovery tools can extract embedded objects and create searchable text for them as part of the process of selecting files to go into the review database. In other words, there is more than one opportunity to create and use OCR text beyond just the final review platform – it can occur in the pre-selection process without cluttering the final review database.

Embedded Charts, Graphs, or Tables – Unpack & Make Separate Review Items

The rationale underlying not extracting embedded graphics does not apply to charts, graphs, or tables created in one program but embedded in or linked to by another document. Reviewers may want to examine the source files to see information that was not displayed in the embedded objects, e.g., to understand the formulae used to create a graph or chart, or to see further context for a table created in Word that was shown in a PowerPoint. For that reason, attorneys will usually want to create separate review objects for the source files for those types of embedded items.

Conclusion

Attorneys should carefully consider how to process objects that are embedded in e-discovery files. Normally it will not make sense to make separate review database records for each embedded PNG, TIF, or GIF – it creates a lot of clutter and diverts attention from more important items. However, it will usually make sense to create separate review records for embedded Microsoft Objects like graphs, tables, or charts that are associated with spreadsheets, databases, or separate documents.

Founder & Principal Consultant
Quantum E-Discovery

Jeff believes in saving time and money in e-discovery by applying a variety of analytics tools early in a case, well before moving content to expensive final review platforms. Over the past 20+ years he has accumulated a variety of tools that can be applied as needed in specific situations. Learn about Jeff’s e-discovery philosophy here.