Avansic's "Metadata and Load Files" whitepaper published on NALS website
07-12-2010, NALS, the Association of Legal Professionals - Dr. Gavin W. Manes
The following whitepaper was included on the "Inside NALS" blog in July 2010. NALS is the national association for legal professionals and holds the honor of being the oldest association formed for legal support professionals.

Metadata: E-Discovery Processing, Litigation, and Loadfile Creation

Electronic versions of documents have a plethora of extra information not included in their paper counterparts, which is often called metadata. Metadata is extra information about a particular document that may be captured by the program or the digital device on which it is created. The academic definition of metadata is “data about data” which refers to the information about a document that isn’t necessarily viewable. Most users don’t even notice that such information is being recorded.

Unlike electronic data, scanned and paper documents have no metadata; the only information that can be gleaned from a stack of paper is the number of pages. Other information such as when the documents were created, who sent them, and what changes were made isn’t available. Some of the most common information available as metadata for electronic documents includes creation, modification, and access dates, authorship information, save locations, dates the document was printed, who sent or received it, or edits made to the document.

There are two types of metadata: internal and external. Internal metadata is information extracted from the document (and therefore part of the original evidence). Examples of internal metadata include the author, to and from fields in an email. External metadata is information created by the operating system or program to describe the data. Examples of external metadata are a hash value or the creation time of a file in Windows.

Relevance to E-Discovery

Metadata can be very useful in the course of e-discovery during litigation. Besides simply providing potentially relevant case data, such as creation date or authorship attribution, metadata can also be used to automate coding and filtering during the processing stage.

From a corporate perspective, metadata can be used to authenticate documents or resolve contract or factual disputes. IT personnel can use metadata to detect duplicates or determine which employees have copies of certain documents.

Metadata in Load Files

Many ESI processing projects convert electronic documents to image formats such as TIFF or PDF. This process “petrifies” the documents so that they cannot change and this action strips the documents of their metadata. Therefore, most productions include a separate file containing each document’s metadata, typically in the form of a spreadsheet with customer-defined fields.

A loadfile containing a production set is typically created by an e-discovery vendor in order to easily input information into a review or trial presentation tool. Since loadfiles come in a variety of formats and versions, deciding on loadfile type is an important early consideration in any e-discovery project.

The most common loadfiles are destined for Summation and Concordance. A Concordance load file (also called a .dat or .opt file) is nothing more than a sophisticated spreadsheet with rows and columns. A Summation loadfile is a text file with a list of items. Both can be easily viewed with common litigation tools, but there are still a variety of options even within Summation and Concordance files.

Many law firms have multi-page specifications for how data is to be processed, metadata and text are extracted, loadfiles are to be created and how native and image files are to be linked. Many firms also request test loadfiles from vendors or opposing parties well before deadlines in order to ensure that the appropriate information is being captured and produced. This is a technical process that often requires some time and know-how to iron out.

Preservation and Removal of Metadata

The presence of metadata is an important consideration both for documents destined for litigation and for documents created by legal professionals in the course of their work. Preserving, handling, and potentially removing metadata from documents should be separately considered for these two situations.

It is always advisable to preserve metadata on evidentiary documents and to take great care and caution to not change metadata during the collection or review of a document set. The simple act of “dragging and dropping” a file changes several metadata fields, whereas a forensics copy of that same file would preserve the document and its associated metadata as it exists in situ.

The use of metadata in regular business practices and the legal profession (such as enabling track changes on a settlement document) can and should be handled differently. In most states, it is an ethics violation to disclose privileged information in metadata, and may even be an ethics violation to look and see if there is privileged information within the metadata. This has been a hotly debated topic by Bar Associations over the past few years. There are commonly available tools to clean metadata out of documents and resolve track changes, remove author names and hidden text. Testing is recommended to meet your acceptable method of risk.


Digital document metadata can provide a wealth of information for both litigation and corporate purposes. Handling metadata appropriately should be a consideration from the very beginning of a case otherwise critical information may be lost. The fields to be requested for inclusion in loadfiles should also be determined early in the case for best results. Knowing that computer programs are collecting such information can help you understand what to expect and be careful to guard your own information.